Help:Advanced Semantic Search
From SCDimensions
Contents |
Semantic Property Search
Semantic MediaWiki includes an easy-to-use query language which enables users to access the wiki's knowledge. The syntax of this query language is similar to the syntax of annotations in Semantic MediaWiki. This query language can be used on the special page Special:Ask, in concepts, and in inline queries. This page provides a short introduction to semantic search in general.
Introduction
Semantic queries specify two things:
- Which pages to select
- What information to display about those pages
All queries must state some conditions that describe what is asked for. You can select pages by name, namespace, category, and most importantly by property values. For example, the query
[[Make::Porsche]]
is a query for all pages with the "Make" property with a value of "Porsche". If you enter this in Special:Ask and click "Find results", SMW executes the query and displays results as a simple table of all matching page titles. If there are many results, they can be browsed via the navigation links at the top and bottom of the query results, for example a query for all persons on semanticweb.org.
The second point is important to display more information. In the example above, one might be interested in the motors installed in the various Porsches. To display that on Special:Ask, one just enters the following into the printout box on the right:
?Motor Model
and SMW displays the same page titles and the values of the Motor Model property on those pages, if any. Printout statements may have some additional settings to further control how the property is displayed.
The most important part of the Semantic search features in Semantic MediaWiki is a simple format for describing which pages should be displayed as the search result. Queries select wiki pages based on the information that has been specified for them using Categories, Properties, and maybe some other MediaWiki features such as a page's namespace. The following paragraphs introduce the main query features in SMW.
Categories and property values
In the introductory example, we gave the single condition [[Make::Porsche]] to describe which pages we were interested in. The markup text is exactly what you would otherwise write to assert that some page has this property and value. Putting it in a semantic query makes SMW return all such pages. This is a general scheme: The syntax for asking for pages that satisfy some condition is exactly the syntax for explicitly asserting that this condition holds.
The following queries show what this means:
- [[Category:Slot Car]] gives all pages directly or indirectly (through a sub-, subsub-, etc. category) in the category.
- [[Model::911]] gives all pages annotated as being about the 911 car model.
- [[Wheelbase::180mm]] gives all pages annotated as being about a slot car having a wheelbase of 180mm.
By using other categories or properties than above, we can already ask for pages which have certain annotations. Next let us combine those requirements:
[[Category:Slot Car]] [[Model::911]] [[Wheelbase::74mm]]
asks for any slot car and with a Model of 911 and with a 74mm wheelbase. In other words: when many conditions are written into one query, the result is narrowed down to those pages that meet all the requirements. Thus we have a logical AND. By the way: queries can also include line breaks in order to make them more readable. So we could as well write:
[[Category:Slot Car]] [[Model::911]] [[Wheelbase::74mm]]
to get the same result as above. Note that queries only return the articles that are positively known to satisfy the required properties: if there is no property for the height of some actor, that actor will not be selected.
When specifying property values, SMW will usually ignore any initial and trailing whitespace, so the two conditions [[Wheelbase::74mm]] and [[Wheelbase:: 74mm ]] mean the same. Datatypes such as number may have additional features such as ignoring commas that might be use to separate the thousands. SMW will also treat synonymous page names the same, just like MediaWiki would usually consider Semantic wiki, Semantic_wiki, and semantic wiki to refer to the same page.
Property values: wildcards and comparators
In the examples above, we gave very concrete property conditions, using «911» and «74mm» as values for properties. In many cases, one does not look for only one particular values, but for a whole range of values, such as all cars with a wheelbase greater than 60mm. In some cases one may even just look for all pages that have any values for a given property at all. For example, four-wheel-drive cars could be those which have a value for the property «Front Pinion Tooth Count». Such general conditions are possible with the help of comparators and wildcards.
- Wildcards are written as "+" and allow any value for a given condition. For example, [[born in::+]] returns all pages that have any value for the property «born in».
Comparators are special symbols like < or >. They are placed after :: in property conditions. SMW currently supports the following comparators:
- > and <: greater than/less than or equal
- !: unequal
- ~: «like» comparison for strings (disabled by default)
Comparators work only for property values, but not for conditions on categories. A wiki installation can limit which comparators are available, which is done by the administrator by modifying the value of $smwgQComparators as explained in the file SMW_Settings.php.
Greater than or equal, less than or equal
With numeric values, you often want to select pages with property values within a certain range. For example
[[Category:Slot Car]] [[Wheelbase::> 70 mm]] [[Wheelbase::<80 mm]]
asks for all cars that have wheelbases between 70 mm and and 80 mm. Note that this takes advantage of the automatic unit conversion: even if the wheelbase of the car was set with [[Wheelbase::2.913in]] it would be recognized as a correct answer (provided that the datatype for Wheelbase understands both units. Note that the comparator means greater/less than or equal – the equality symbol = is not needed.
Such range conditions on property values are mostly relevant if values can be ordered in a natural way. For example, it makes sense to ask [[start date::>May 6 2006]] but is is not really helpful to say [[homepage URL::>http://www.somewhere.org]].
If a datatype has no natural linear ordering, Semantic MediaWiki will just apply the alphabetical order to the normalised datavalues as they are used in the RDF export. You can thus use greater than and less than to select alphabetic ranges of a string property. For example, you could ask [[surname::>Do]] [[surname::<G]] to select surnames between «Do» and up to «G». For wiki pages, the comparator refers to the name of the given page (without the namespace prefix).
Here and in all other uses of comparators, it might happen that a searched for value really starts with a symbol like <. In this case, SMW can be prevented from interpreting the symbol as a comparator if a space is inserted after ::. For example, [[property:: <br>]] really searches for pages with the value «<br>» for the given property.
Not equal
You can select pages that have a property value which is unequal to a given value. For example, [[Motor Model::!RX-4]] will select pages that have a motor model which is not «RX-4». Note that this is query description does not look for pages which do not have an area code 415. Rather, it looks for all pages that (also) have a code unequal to RX-4. In particular, pages that have no motor model at all cannot be the result of the above query.
As with the (default) equality comparator, the use of custom units may require rounding in numeric conversions that can lead to unexpected results. For example, [[height::!6.00 ft]] may still select someone whose height displays as «6.00 feet» simply because the exact numeric value is not really 6. In such situations, it might be more useful to query for pages that have a property value outside a certain range, expressed by taking a disjunction (see below) of conditions with < and >.
Like
The comparator ~ works only for properties of Type:String and Type:Geographic coordinate. For strings, in a like condition one uses '*' wildcards to match any sequence of characters and '?' to match any single character. For example, one could ask "[[Address::~*Park Place*]]" to select addresses containing the string «Park Place», or "[[Honorific::~M?.]]" to select both «Mr.» and «Ms.». For coordinates, this comparator takes a coordinate and finds all points that are close to it (provided they match the other criteria). The default distance away is 5 miles, or 8.05 kilometers, its equivalent. This distance can be changed by adding the "distance=" parameter, which can take a value in either miles or kilometers. So, to find all pages with a coordinate value within 3 kilometers of the Eiffel Tower, you could add "[[Coordinates::~25.0955°N, 55.342083°E]]|distance=3 km".
Unions of query results: disjunctions
Disjunctions are OR-conditions that admit several alternative conditions on query results. SMW has two ways of writing disjunctions in queries:
- The operator OR is used for taking the union of two queries.
- The operator || is used for disjunctions in values, page, and category names.
In any case, the disjunction requires that at least one (but maybe more than one) of the possible alternatives is satisfied (logical OR). For example, the query
[[Motor Model::RX-42]] OR [[Motor Model::RX-42B]]
describes all pages that contain the Motor Model property equal to RX-42 or RX-42B motors. This can also be written with || as as [[Motor Model::RX-42]]||[[Motor Model::RX-42B]]. In the latter case, «RX-42||RX-42B» describes a value that may be either of the two alternatives. Writing queries with || is usually more concise, but not all disjunctions can be written in this way. The following is an example that can not be expressed with ||:
[[Motor Model::RX-42]] OR [[Category:Slot Car]]
The || syntax can be used not only in property values, but also with catgories, like in the query [[Category:Software||RMS]].
Describing single pages
So far, all conditions depended on some or the other annotation given within an page. But there are also conditions to directly select some pages, or pages from a given namespace.
Directly giving some page title (possibly including a namespace prefix), or a list of such page titles separated by ||, selects the pages with those names. An example is the query
[[Porsche||Mercedes||Drivers:John Doe]]
which has three results (at least if the pages exist). Note that the result does not display any namespace prefixes; see the hover box or status bar of the browser, or follow the links to determine the namespace. Restricting the set based on an attribute value one could ask, e.g., «Who of Gabriele Tarquini, Pierre Kaffer, David Brabham and Jorg Muller drove for Aston Martin?». But direct selection of articles is most useful if further properties of those articles are asked for, e.g. to simply print the car that Jorg Muller drove.
To select a category in this way, a : must be put before the category name. This avoids confusing [[Category:Slot Car]] (return all slot cars) and [[:Category:Slot Car]] (return the category «Slot Car»).
Restricting results to a namespace
A less strict way of selecting given pages is via namespaces. The default is to return pages in every namespace. To return pages in a particular namespace, specify the namespace with a «wildcard», e.g. write [[Help:+]] to return every page in the «Help» namespace. Since the main namespace usually has no prefix, write [[:+]] to select only pages in the main namespace.
Disjunctions work again with the || syntax as above. For example, to return pages in either the main or «User» namespace, write [[:+||User:+]]. To return pages in the «Category» namespace, a : is again needed in front of the namespace label to prevent confusion.
Subqueries and property chains
Enumerating multiple pages for a property is cumbersome and hard to maintain. For instance, to select all actors that are born in a Italian city one could write:
[[Category:Actor]] [[born in::Rome||Milan||Turin||Florence||...]]
To generate a list of all these Italian cities one could run another query
[[Category:City]] [[located in::Italy]]
and copy and paste the results into the first query. What one would like to do is to use the city query as a subquery within the actor query to obtain the desired result directly. Instead of a fixed list of page names for the property's value, a new query enclosed in <q> and </q> is inserted within the property condition. In this example, one can thus write:
[[Category:Actor]] [[born in::<q>[[Category:City]] [[located in::Italy]]</q>]]
Arbitrary levels of nesting are possible, though nesting might be restricted for a particular site to ensure performance. For another example, to select all cities of the European Union you could write:
[[Category:Cities]] [[located in::<q>[[member of::European Union]]</q>]]
(no results within this wiki)
In the above example, we essentially have constructed a chain of properties «located in» and «member of» to find things that are located in something which is a member of the EU. Queries can be written in a shorter form for this common case:
[[Category:Cities]] [[located in.member of::European Union]]
This query has the same meaning as above, but with much less special sybols required. In general, chains of properties are created by listing all properties separated by dots. In the rare case that a property should contain a dot in its name, one may start the query with a space to prevent SMW from interpreting this dot in a special way.
NOTE: It is not possible to use a subquery to obtain a list of properties that is then used in a query. See #Subqueries for properties below.
Using templates and variables
Arbitrary templates and variables can be used in a query. An example is a selection criteria that displays all future events based on the current date:
[[Category:Event]]
[[end date::>{{CURRENTYEAR}}-{{CURRENTMONTH}}-{{CURRENTDAY}}]]
Another particularly useful variable for inline queries is {{FULLPAGENAME}} for the current page with namespace, which allows you to reuse a generic query on many pages. Read about inline queries for more information.
Sorting results
It is often helpful to present query results in a suitable order, for example to present a list of European countries ordered by population. Special:Ask has a simple interface to add a sorting condition to a query. The name of the property to sort by is entered into a text input, and ascending or descending order can be selected. SMW will usually attempt to sort results by the natural order that the values of the selected property may have: numbers are sorted numerically, strings are sorted alphabetically, dates are sorted chronologically. The order therefore is the same as in the case of the < and > comparators in queries. If no specific sorting condition is provided, results will be ordered by their page name.
It is possible to provide more than one sorting condition. If multiple results turn out to be equal regarding the first sorting condition, the next condition is used to order them and so on. A query for actors, e.g., could be ordered by year of birth and use the last name of the actor as a second ordering condition. All actors that were born in the same year would thus be ordered alphabetically by their last name instead of appearing in random order.
Sorting a query can also influence the result of a query, because it is only possible to sort by property values that a page actually has. Therefore, if a query is ordered by a property (say «Population») then SMW will usually restrict the query results to those pages that have at least one value for this property (i.e. only pages with specified population appear). Therefore, if the query does not require yet that the property is present in each query result, then SMW will silently add this condition. But SMW will always try to find the ordering property withint the given query first, and it is even possible to order query results by subproperties. Some examples should illustrate this:
- [[Category:Motor]] [[Motor RPM - Rated::+]] ordered by «Motor RPM - Rated» will present the motors with rated RPM in ascending order. The query result is the same as without the sorting.
- [[Category:Motor]] ordered by «Motor RPM - Rated» will again present the motors with RPM ratings in ascending order. The query result may be modified due to the sorting condition: if there are motors without a RPM ratings given, then these will no longer appear in the result.
If a property that is used for sorting has more than one value for some page, then this page will still appear only once in the result list. The position that the page takes in this case is not defined by SMW and may correspond to either of the property values.
Query results displayed in a result table can also be ordered dynamically by clicking on the small sort icons found in the table heading of each column. This function requires JavaScript to be enabled in the browser and will sort only the displayed results. So if, e.g., a query has retrieved the twenty world-largest cities by population, it is possible to sort these twenty cities alphabetically or in reverse order of population, but the query will certainly not show the twenty world-smallest cities when reversing the order of the population column. the dynamic sorting of tables attempts to use the same order as used in SMW queries, and in particular orders numbers and dates in a natural way. However, the alphabetical order of strings and page names may slightly vary from the wiki's alphabetic order, simply because there are many international alphabets that can be ordered in different ways depending on the language preference.
Linking to Semantic Search Results
Links to semantic query results on Special:Ask can be created by means of the inline query feature in SMW as explained in its documentation. It is not recommended to create links directly, since they are very lengthy and use a specific encoding. Developers who create extensions that link to Special:Ask should also use SMW's internal functions for building links. Understanding the details of SMW's encoding of queries in links is therefore not required for using SMW.
Things that are not possible
Subqueries for properties
It is not possible to use a subquery to obtain a list of properties that is then used in a query. One can, however, use a query that returns a list of properties, and copy and paste the result into another query. Alternatively, one can use the template results format to pass properties directly to another query.
Queries with special properties
SMW currently does not support queries for the values of any of SMW's built-in Special properties such as «Has type», «Allows value» or «Equivalent URI».
Queries in Semantic MediaWiki return a list of pages, and the default result of a query therefore simply displays the selected pages' titles. Additional information such as a page's property values or categories, can be included into a query result by using additional printout statements that are introduced here. In Special:Ask, printout statements can simply be entered into the input box on the right, with one statement per line.
There are different kinds of printout statements, but all of them can be recognized by the question mark ? that they start with. The important difference between printout statements and query descriptions is that the former do not restrict the result set in any way: even if some printout has no values for a given page, an empty field will be printed, but the page is still part of the result.
Printing property values
The most common form of printout statements are property printouts, that make SMW display all values assigned to a certain property. These are written simply as a question mark followed by the property name, e.g.
?Model
prints the values for «model» of all query results. On Special:Ask, the result of each printout is shown in a table column that is labeled by the name of the property. It is possible to change that label for a printout, and this will be very useful when using queries on wiki pages (it is not really relevant on Special:Ask of course). The equality symbol is used to change the label:
?model = Car Model
The above still prints population, but with the modified label in the table header. As mentioned above, property printouts may have an empty result for some pages, e.g. if something does not have any population. Property conditions with wildcards (see above) can be used to ensure that all elements in a query result have some value for the printed property, if this is desired.
Printing categories
There are two ways to print category information: either SMW prints all categories assigned to some page, or SMW checks for one particular category. The first case is achieved by the printout
?Category
where «Category» is the name of the Category namespace in the local language. This printout will show all categories that are directly used on a result page. The other option is to ask for one particular category, such as
?Category:Motor
The result then will contain a column «Motor» that contains X for all pages that directly belong to that category, and is empty otherwise. Again, one can change the label using equality:
?Category:Motor = M
will merely display an «M» as the header of the result column which might be more sensible given that the entries in that column are very short. It is also possible to change the way in which this kind of category queries are formatted, as described below.
The main result column
All queries by default display the main list of result pages in the first column. In some cases, it can be useful to move that to another position. This is not relevant for Special:Ask, but can be quite useful in inline queries. A special printout statement is available for this purpose:
?
This single question mark addresses the «unlabelled result column» that shows the main result list. As before, different labels can be assigned with the equality symbol, e.g.
? = Results
Display format
Many printout statements can be further customised by giving a printout format which can be given after a property name, separated by the symbol #. The available formats depend on the type of the printout and involved property.
Plain (unformatted) printouts
A general format that is supported by most types of printouts is the plain format (or empty format), available since SMW 1.4.3. Printouts with this format will avoid all forms of beautification or linking in their presentation, and return a plain value instead. This is particularly useful when results are further processed in templates or parser functions. To select the plain output format, a hyphen ("-") or simply nothing is used as a printout string, as in the following examples:
?Make# - ?Wheelbase #
Both printouts select the plain format. Spaces do not matter and can be inserted to increase readability. For numerical properties like the Wheelbase number, the plain format is a simple number string without commas to separate digits. For properties of type page, the plain output is simply the name of the page without any link.
Formats for specific printout types
For properties that support units, queries can thus determine which unit should be used for the output. To print the Wheelbase in inches, e.g., one would use the following:
?Wheelbase#in
this assumes that the property Wheelbase is aware of the unit «in». For properties of type date, the output format "ISO" is available to obtain results in a technical format that conforms to the ISO 8601 standard. Other datatypes may have different printout formats. See the types documentation for details.
For printouts of the form ?Category:Slot Car, the display format can be used to modify what SMW will display for cases where a page is (or is not) in the category. The following is an example:
?Category:Slot Car#a slot car, not a slot car
This will show the text «a slot car» for all pages that are slot cars, and the text «not a slot car» otherwise. This can, for example, also be used in combination with small images to display icons for certain categories.
BlogMarks
del.icio.us
digg
Fark
Furl
Newsvine
reddit
Segnalo
Simpy
Slashdot
smarking
Spurl
Wists

