Difference between revisions of "Approximate search"

From Freeplane - free mind mapping and knowledge management software
m
m
Line 1: Line 1:
 
= Approximate Search in Freeplane =
 
= Approximate Search in Freeplane =
  
Starting with 1.2.x, Freeplane includes an 'approximate Search' feature which allows to find
+
Starting with 1.2.x, Freeplane includes an 'Approximate Search' feature which allows to find
 
content (nodes, notes, or whatever) that do not exactly match the search term, but allow some changes
 
content (nodes, notes, or whatever) that do not exactly match the search term, but allow some changes
("distance") between the search term and the content.
+
("distance") between the search term and the content. See the section ''Search results'' below for
 +
some examples of approximate searching.
  
 
== How to run (approximate) Search ==
 
== How to run (approximate) Search ==
Press Ctrl-F or select View->Toolbars->Filter toolbar to show the "filter/search" toolbar and enable
+
Press ''Ctrl-F'' or select ''View->Toolbars->Filter'' toolbar to show the "filter/search" toolbar and enable
 
Approximate searching by enabling the "Approximate" checkbox in the filter toolbar (Alt+A). Then you can
 
Approximate searching by enabling the "Approximate" checkbox in the filter toolbar (Alt+A). Then you can
 
enter a search term in the search box () and press Enter or click the
 
enter a search term in the search box () and press Enter or click the
"Find next" Button (icon with a blue right arrow in the filter toolbar) to start the search.
+
''Find next'' Button (icon with a blue right arrow in the filter toolbar) to start the search.
 
If you want to check for equality (search term equals content as opposed to is contained in content),
 
If you want to check for equality (search term equals content as opposed to is contained in content),
 
then you should change the default "Contains" in the search type combobox to "Is equal to".
 
then you should change the default "Contains" in the search type combobox to "Is equal to".
You can define more complex search rules by using the Filter->Compose filter dialog.
+
You can define more complex search rules by using the ''Filter->Compose filter'' dialog.
  
 
== Search results ==
 
== Search results ==
Line 23: Line 24:
  
 
There are also limitations for this search feature!
 
There are also limitations for this search feature!
A search is considered a match if the (edit) distance is < 0.35 times the length of the search term,
+
A search is considered a match depending on the edit distance between the search term and the content
that means that you get False Positives (i.e. "print"="pointer" for a 'Contains' match etc.) and False Negatives
+
(see ''threshold'' section below). That means that you get False Positives (i.e. "print"="pointer" for a 'Contains' match etc.) and False Negatives
 
(i.e. "fitness center"!="fitness studio" or for an equality match: "RandomFileAccess"!="RandomAccessFolder", etc.).
 
(i.e. "fitness center"!="fitness studio" or for an equality match: "RandomFileAccess"!="RandomAccessFolder", etc.).
 
  
 
== Configuring the threshold ==
 
== Configuring the threshold ==
Line 35: Line 35:
 
section ''Search''. The lower the value, the more variations of the search term will be found. minProb=0.15 will find almost anything,
 
section ''Search''. The lower the value, the more variations of the search term will be found. minProb=0.15 will find almost anything,
 
while minProb=1.0 will do exact string matching. The results above are observed using the default minProb=0.65.
 
while minProb=1.0 will do exact string matching. The results above are observed using the default minProb=0.65.
 +
 +
== Implementation details ==
 +
  
 
== Feedback ==
 
== Feedback ==
 
Please edit this section (if you have the privileges) or write a mail to [mailto:fnatter@gmx.net fnatter@gmx.net],
 
Please edit this section (if you have the privileges) or write a mail to [mailto:fnatter@gmx.net fnatter@gmx.net],
 
your feedback is very much appreciated!
 
your feedback is very much appreciated!

Revision as of 15:45, 26 February 2012

Approximate Search in Freeplane

Starting with 1.2.x, Freeplane includes an 'Approximate Search' feature which allows to find content (nodes, notes, or whatever) that do not exactly match the search term, but allow some changes ("distance") between the search term and the content. See the section Search results below for some examples of approximate searching.

How to run (approximate) Search

Press Ctrl-F or select View->Toolbars->Filter toolbar to show the "filter/search" toolbar and enable Approximate searching by enabling the "Approximate" checkbox in the filter toolbar (Alt+A). Then you can enter a search term in the search box () and press Enter or click the Find next Button (icon with a blue right arrow in the filter toolbar) to start the search. If you want to check for equality (search term equals content as opposed to is contained in content), then you should change the default "Contains" in the search type combobox to "Is equal to". You can define more complex search rules by using the Filter->Compose filter dialog.

Search results

Once you activate Approximate matching, all matching using the filter tool is done approximately, i.e.

  • you can search for "file" and it will find "flie" (typo)
  • you can search for "hobbies" and it will find "hobbys"
  • if you use Freeplane for program source code: you can search for "NumberFormat" and it will find "aNumber_Format"
  • "network" = "netzwerk"
  • ...

There are also limitations for this search feature! A search is considered a match depending on the edit distance between the search term and the content (see threshold section below). That means that you get False Positives (i.e. "print"="pointer" for a 'Contains' match etc.) and False Negatives (i.e. "fitness center"!="fitness studio" or for an equality match: "RandomFileAccess"!="RandomAccessFolder", etc.).

Configuring the threshold

Matching is done based on edit distance (Damerau-Levenshtein algorithm, the popular library Lucene uses the same algorithm), i.e. a match beween search term x and content y is considered a hit (true positive) if (|x| - dist(x,y))/|x| > minProb. The Threshold for approximate matching minprob can be customized in the Tools->Preferences dialog, tab Behavior, section Search. The lower the value, the more variations of the search term will be found. minProb=0.15 will find almost anything, while minProb=1.0 will do exact string matching. The results above are observed using the default minProb=0.65.

Implementation details

Feedback

Please edit this section (if you have the privileges) or write a mail to fnatter@gmx.net, your feedback is very much appreciated!