Test search and fix bugs

Add page tests
More snippet and highlight testing and fixing
2023-08-24 14:14:33 +02:00 · 2023-08-24 13:06:02 +02:00 · 2023-08-24 12:42:50 +02:00 · 2023-08-24 10:33:32 +02:00 · 2023-08-24 10:32:07 +02:00 · 2023-08-24 10:12:54 +02:00
15 changed files with 701 additions and 101 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -1 +1,2 @@
 /oddmu
+test.md
--- a/README.md
+++ b/README.md
@@ -35,6 +35,15 @@ extension.
 `{{printf "%s" .Body}}` is the Markdown, as a string (the data itself
 is a byte array and that's why we need to call `printf`).

+When calling the `save` action, the page name is take from the URL and
+the page content is taken from the `body` form parameter. To
+illustrate, here's how to edit a page using `curl`:
+
+```sh
+curl --form body="Did you bring a towel?" \
+  http://localhost:8080/save/welcome
+```
+
 ## Building

 ```sh
@@ -126,15 +135,15 @@ MDCertificateAgreement accepted

    RewriteEngine on
    RewriteRule ^/$ http://%{HTTP_HOST}:8080/view/index [redirect]
-    RewriteRule ^/(view|edit|save)/(.*) http://%{HTTP_HOST}:8080/$1/$2 [proxy]
+    RewriteRule ^/(view|edit|save|search)/(.*) http://%{HTTP_HOST}:8080/$1/$2 [proxy]
 </VirtualHost>
 ```

 First, it manages the domain, getting the necessary certificates. It
 redirects regular HTTP traffic from port 80 to port 443. It turns on
 the SSL engine for port 443. It redirects `/` to `/view/index` and any
-path that starts with `/view/`, `/edit/` or `/save/` is proxied to
-port 8080 where the Oddmu program can handle it.
+path that starts with `/view/`, `/edit/`, `/save/` or `/search/` is
+proxied to port 8080 where the Oddmu program can handle it.

 Thus, this is what happens:

@@ -208,9 +217,13 @@ DocumentRoot /home/oddmu/static

 Create this directory, making sure to give it a permission that your
 webserver can read (world readable file, world readable and executable
-directory). Populate it with files. For example, create a file called
-`robots.txt` containing the following, tellin all robots that they're
-not welcome.
+directory). Populate it with files.
+
+Make sure that none of the static files look like the wiki paths
+`/view/`, `/edit/`, `/save/` or `/search/`.
+
+For example, create a file called `robots.txt` containing the
+following, tellin all robots that they're not welcome.

 ```text
 User-agent: *
@@ -223,9 +236,6 @@ and without needing a wiki page.
 [Wikipedia](https://en.wikipedia.org/wiki/Robot_exclusion_standard)
 has more information.

-All you have make sure is that none of the static files look like the
-wiki paths `/view/`, `/edit/` or `/save/`.
-
 ## Customization (with recompilation)

 The Markdown parser can be customized and
@@ -242,61 +252,33 @@ rocket links (`=>`). Here's how to modify the `loadPage` so that a
 translated into Markdown:

 ```go
-func loadPage(title string) (*Page, error) {
-	filename := title + ".md"
+func loadPage(name string) (*Page, error) {
+	filename := name + ".md"
 	body, err := os.ReadFile(filename)
 	if err == nil {
-		return &Page{Title: title, Name: title, Body: body}, nil
+		return &Page{Title: name, Name: name, Body: body}, nil
 	}
-	filename = title + ".gmi"
+	filename = name + ".gmi"
 	body, err = os.ReadFile(filename)
 	if err == nil {
-		return &Page{Title: title, Name: title, Body: body}, nil
+		return &Page{Title: name, Name: name, Body: body}, nil
 	}
 	return nil, err
 }
 ```

 There is a small problem, however: By default, Markdown expects an
-empty line before a list begins. The following change to `viewHandler`
+empty line before a list begins. The following change to `renderHtml`
 uses the `NoEmptyLineBeforeBlock` extension for the parser:

 ```go
-func viewHandler(w http.ResponseWriter, r *http.Request, title string) {
-	// Short cut for text files
-	if (strings.HasSuffix(title, ".txt")) {
-		body, err := os.ReadFile(title)
-		if err == nil {
-			w.Write(body)
-			return
-		}
-	}
-	// Attempt to load Markdown or Gemini page; edit it if this fails
-	p, err := loadPage(title)
-	if err != nil {
-		http.Redirect(w, r, "/edit/"+title, http.StatusFound)
-		return
-	}
-	// Render the Markdown to HTML, extracting a title and
-	// possibly sanitizing it
-	s := string(p.Body)
-	m := titleRegexp.FindStringSubmatch(s)
-	if m != nil {
-		p.Title = m[1]
-		p.Body = []byte(strings.Replace(s, m[0], "", 1))
-	}
+func (p* Page) renderHtml() {
    // Here is where a new extension is added!
 	extensions := parser.CommonExtensions | parser.NoEmptyLineBeforeBlock
 	markdownParser := parser.NewWithExtensions(extensions)
-	flags := html.CommonFlags
-	opts := html.RendererOptions{
-		Flags: flags,
-	}
-	htmlRenderer := html.NewRenderer(opts)
-	maybeUnsafeHTML := markdown.ToHTML(p.Body, markdownParser, htmlRenderer)
+	maybeUnsafeHTML := markdown.ToHTML(p.Body, markdownParser, nil)
 	html := bluemonday.UGCPolicy().SanitizeBytes(maybeUnsafeHTML)
 	p.Html = template.HTML(html);
-	renderTemplate(w, "view", p)
 }
 ```

@@ -306,6 +288,10 @@ Page titles are filenames with `.md` appended. If your filesystem
 cannot handle it, it can't be a page title. Specifically, *no slashes*
 in filenames.

+The pages are indexed as the server starts and the index is kept in
+memory. If you have a ton of pages, this surely wastes a lot of
+memory.
+
 ## References

 [Writing Web Applications](https://golang.org/doc/articles/wiki/)
--- a/go.mod
+++ b/go.mod
@@ -3,6 +3,7 @@ module alexschroeder.ch/cgit/oddmu
 go 1.21.0

 require (
+	github.com/dgryski/go-trigram v0.0.0-20160407183937-79ec494e1ad0
 	github.com/gomarkdown/markdown v0.0.0-20230716120725-531d2d74bc12
 	github.com/microcosm-cc/bluemonday v1.0.25
 )
--- a/go.sum
+++ b/go.sum
@@ -1,5 +1,7 @@
 github.com/aymerick/douceur v0.2.0 h1:Mv+mAeH1Q+n9Fr+oyamOlAkUNPWPlA8PPGR0QAaYuPk=
 github.com/aymerick/douceur v0.2.0/go.mod h1:wlT5vV2O3h55X9m7iVYN0TBM0NH/MmbLnd30/FjWUq4=
+github.com/dgryski/go-trigram v0.0.0-20160407183937-79ec494e1ad0 h1:b+7JSiBM+hnLQjP/lXztks5hnLt1PS46hktG9VOJgzo=
+github.com/dgryski/go-trigram v0.0.0-20160407183937-79ec494e1ad0/go.mod h1:qzKC/DpcxK67zaSHdCmIv3L9WJViHVinYXN2S7l3RM8=
 github.com/gomarkdown/markdown v0.0.0-20230716120725-531d2d74bc12 h1:uK3X/2mt4tbSGoHvbLBHUny7CKiuwUip3MArtukol4E=
 github.com/gomarkdown/markdown v0.0.0-20230716120725-531d2d74bc12/go.mod h1:JDGcbDT52eL4fju3sZ4TeHGsQwhG9nbDV21aMyhwPoA=
 github.com/gorilla/css v1.0.0 h1:BQqNyPTi50JCFMTw/b67hByjMVXZRwGha6wxVGkeihY=
--- a/highlight.go
+++ b/highlight.go
@@ -0,0 +1,45 @@
+package main
+
+import (
+	"strings"
+	"regexp"
+)
+
+// highlight splits the query string q into terms and highlights them
+// using the bold tag. Return the highlighted string and a score.
+func highlight (q string, s string) (string, int) {
+	c := 0
+	re, err := regexp.Compile("(?i)" + q)
+	if err == nil {
+		m := re.FindAllString(s, -1)
+		if m != nil {
+			// Score increases for each full match of q.
+			c += len(m)
+		}
+	}
+	for _, v := range strings.Split(q, " ") {
+		if len(v) == 0 {
+			continue
+		}
+		re, err := regexp.Compile(`(?is)(\pL?)(` + v + `)(\pL?)`)
+		if err != nil {
+			continue
+		}
+		r := make(map[string]string)
+		for _, m := range re.FindAllStringSubmatch(s, -1) {
+			// Term matched increases the score.
+			c++
+			// Terms matching at the beginning and
+			// end of words and matching entire
+			// words increase the score further.
+			if len(m[1]) == 0 { c++ }
+			if len(m[3]) == 0 { c++ }
+			if len(m[1]) == 0 && len(m[3]) == 0 { c++ }
+			r[m[2]] = "<b>" + m[2] + "</b>"
+		}
+		for old, new := range r {
+			s = strings.ReplaceAll(s, old, new)
+		}
+	}
+	return s, c
+}
--- a/highlight_test.go
+++ b/highlight_test.go
@@ -0,0 +1,63 @@
+package main
+
+import (
+	"testing"
+)
+
+func TestHighlight(t *testing.T) {
+
+	s := `The windows opens
+A wave of car noise hits me
+No birds to be heard.`
+
+	h := `The <b>window</b>s opens
+A wave of car noise hits me
+No birds to be heard.`
+
+	q := "window"
+	r, c := highlight(q, s)
+	if r != h {
+		t.Logf("The highlighting is wrong in ｢%s｣", r)
+		t.Fail()
+	}
+	// Score:
+	// - q itself
+	// - the single token
+	// - the beginning of a word
+	if c != 3 {
+		t.Logf("%s score is %d", q, c)
+		t.Fail()
+	}
+	q = "windows"
+	_, c = highlight(q, s)
+	// Score:
+	// - q itself
+	// - the single token
+	// - the beginning of a word
+	// - the end of a word
+	// - the whole word
+	if c != 5 {
+		t.Logf("%s score is %d", q, c)
+		t.Fail()
+	}
+	q = "car noise"
+	_, c = highlight(q, s)
+	// Score:
+	// - car noise (+1)
+	// - car, with beginning, end, whole word (+4)
+	// - noise, with beginning, end, whole word (+4)
+	if c != 9 {
+		t.Logf("%s score is %d", q, c)
+		t.Fail()
+	}
+	q = "noise car"
+	_, c = highlight(q, s)
+	// Score:
+	// - the car token
+	// - the noise token
+	// - each with beginning, end and whole token (3 each)
+	if c != 8 {
+		t.Logf("%s score is %d", q, c)
+		t.Fail()
+	}
+}
--- a/page.go
+++ b/page.go
@@ -0,0 +1,111 @@
+package main
+
+import (
+	"github.com/gomarkdown/markdown"
+	"github.com/gomarkdown/markdown/ast"
+	"github.com/gomarkdown/markdown/parser"
+	"github.com/microcosm-cc/bluemonday"
+	"html/template"
+	"strings"
+	"bytes"
+	"os"
+)
+
+// Page is a struct containing information about a single page. Title
+// is the title extracted from the page content using titleRegexp.
+// Name is the filename without extension (so a filename of "foo.md"
+// results in the Name "foo"). Body is the Markdown content of the
+// page and Html is the rendered HTML for that Markdown. Score is a
+// number indicating how well the page matched for a search query.
+type Page struct {
+	Title string
+	Name  string
+	Body  []byte
+	Html  template.HTML
+	Score int
+}
+
+// save saves a Page. The filename is based on the Page.Name and gets
+// the ".md" extension. Page.Body is saved, without any carriage
+// return characters ("\r"). The file permissions used are readable
+// and writeable for the current user, i.e. u+rw or 0600. Page.Title
+// and Page.Html are not saved no caching. There is no caching.
+func (p *Page) save() error {
+	filename := p.Name + ".md"
+	s := bytes.ReplaceAll(p.Body, []byte{'\r'}, []byte{})
+	p.Body = s
+	p.updateIndex()
+	return os.WriteFile(filename, s, 0600)
+}
+
+// loadPage loads a Page given a name. The filename loaded is that
+// Page.Name with the ".md" extension. The Page.Title is set to the
+// Page.Name (and possibly changed, later). The Page.Body is set to
+// the file content. The Page.Html remains undefined (there is no
+// caching).
+func loadPage(name string) (*Page, error) {
+	filename := name + ".md"
+	body, err := os.ReadFile(filename)
+	if err != nil {
+		return nil, err
+	}
+	return &Page{Title: name, Name: name, Body: body}, nil
+}
+
+// handleTitle extracts the title from a Page and sets Page.Title, if
+// any. If replace is true, the page title is also removed from
+// Page.Body. Make sure not to save this! This is only for rendering.
+func (p* Page) handleTitle(replace bool) {
+	s := string(p.Body)
+	m := titleRegexp.FindStringSubmatch(s)
+	if m != nil {
+		p.Title = m[1]
+		if replace {
+			p.Body = []byte(strings.Replace(s, m[0], "", 1))
+		}
+	}
+}
+
+// renderHtml renders the Page.Body to HTML and sets Page.Html.
+func (p* Page) renderHtml() {
+	maybeUnsafeHTML := markdown.ToHTML(p.Body, nil, nil)
+	html := bluemonday.UGCPolicy().SanitizeBytes(maybeUnsafeHTML)
+	p.Html = template.HTML(html);
+}
+
+// plainText renders the Page.Body to plain text and returns it,
+// ignoring all the Markdown and all the newlines. The result is one
+// long single line of text.
+func (p* Page) plainText() string {
+	parser := parser.New()
+	doc := markdown.Parse(p.Body, parser)
+	text := []byte("")
+	ast.WalkFunc(doc, func(node ast.Node, entering bool) ast.WalkStatus {
+		if entering && node.AsLeaf() != nil {
+			text = append(text, node.AsLeaf().Literal...)
+			text = append(text, []byte(" ")...)
+		}
+		return ast.GoToNext
+	})
+	// Some Markdown still contains newlines
+	for i, c := range text {
+		if c == '\n' {
+			text[i] = ' '
+		}
+	}
+	// Remove trailing space
+	for text[len(text)-1] == ' ' {
+		text = text[0:len(text)-1]
+	}
+	return string(text)
+}
+
+// summarize for query string q sets Page.Html to an extract.
+func (p* Page) summarize(q string) {
+	p.handleTitle(true)
+	s, c := snippets(q, p.plainText())
+	p.Score = c
+	extract := []byte(s)
+	html := bluemonday.UGCPolicy().SanitizeBytes(extract)
+	p.Html = template.HTML(html)
+}
--- a/page_test.go
+++ b/page_test.go
@@ -0,0 +1,59 @@
+package main
+
+import (
+	"strings"
+	"testing"
+)
+
+func TestPageTitle (t *testing.T) {
+	p := &Page{Body: []byte(`# Ache
+My back aches for you
+I sit, stare and type for hours
+But yearn for blue sky`)}
+	p.handleTitle(false)
+	if p.Title != "Ache" {
+		t.Logf("The page title was not extracted correctly: %s", p.Title)
+		t.Fail()
+	}
+	if !strings.HasPrefix(string(p.Body), "# Ache") {
+		t.Logf("The page title was removed: %s", p.Body)
+		t.Fail()
+	}
+	p.handleTitle(true)
+	if !strings.HasPrefix(string(p.Body), "My back") {
+		t.Logf("The page title was not removed: %s", p.Body)
+		t.Fail()
+	}
+}
+
+func TestPagePlainText (t *testing.T) {
+	p := &Page{Body: []byte(`# Water
+The air will not come
+To inhale is an effort
+The summer heat kills`)}
+	s := p.plainText()
+	r := "Water The air will not come To inhale is an effort The summer heat kills"
+	if s != r {
+		t.Logf("The plain text version is wrong: %s", s)
+		t.Fail()
+	}
+}
+
+func TestPageHtml (t *testing.T) {
+	p := &Page{Body: []byte(`# Sun
+Silver leaves shine bright
+They droop, boneless, weak and sad
+A cruel sun stares down`)}
+	p.renderHtml()
+	s := string(p.Html)
+	r := `<h1>Sun</h1>
+
+<p>Silver leaves shine bright
+They droop, boneless, weak and sad
+A cruel sun stares down</p>
+`
+	if s != r {
+		t.Logf("The HTML is wrong: %s", s)
+		t.Fail()
+	}
+}
--- a/search.go
+++ b/search.go
@@ -0,0 +1,110 @@
+package main
+
+import (
+	trigram "github.com/dgryski/go-trigram"
+	"path/filepath"
+	"strings"
+	"slices"
+	"io/fs"
+	"fmt"
+)
+
+// Search is a struct containing the result of a search. Query is the
+// query string and Items is the array of pages with the result.
+// Currently there is no pagination of results! When a page is part of
+// a search result, Body and Html are simple extracts.
+type Search struct {
+	Query   string
+	Items   []Page
+	Results bool
+}
+
+// index is a struct containing the trigram index for search. It is
+// generated at startup and updated after every page edit.
+var index trigram.Index
+
+// documents is a map, mapping document ids of the index to page
+// names.
+var documents map[trigram.DocID]string
+
+func indexAdd(path string, info fs.FileInfo, err error) error {
+	if err != nil {
+		return err
+	}
+	filename := path
+	if info.IsDir() || strings.HasPrefix(filename, ".") || !strings.HasSuffix(filename, ".md") {
+		return nil
+	}
+	name := strings.TrimSuffix(filename, ".md")
+	p, err := loadPage(name)
+	if err != nil {
+		return err
+	}
+	id := index.Add(string(p.Body))
+	documents[id] = p.Name
+	return nil
+}
+
+func loadIndex() error {
+	index = make(trigram.Index)
+	documents = make(map[trigram.DocID]string)
+	err := filepath.Walk(".", indexAdd)
+	if err != nil {
+		fmt.Println("Indexing failed")
+		index = nil
+		documents = nil
+	}
+	return err
+}
+
+func (p *Page) updateIndex() {
+	var id trigram.DocID
+	for docId, name := range documents {
+		if name == p.Name {
+			id = docId
+			break
+		}
+	}
+	if id == 0 {
+		id = index.Add(string(p.Body))
+		documents[id] = p.Name
+	} else {
+		o, err := loadPage(p.Name)
+		if err == nil {
+			index.Delete(string(o.Body), id)
+		}
+		index.Insert(string(p.Body), id)
+	}
+}
+
+// search returns a sorted []Page where each page contains an extract
+// of the actual Page.Body in its Page.Html.
+func search(q string) []Page {
+	ids := index.Query(q)
+	items := make([]Page, len(ids))
+	for i, id := range ids {
+		name := documents[id]
+		p, err := loadPage(name)
+		if err != nil {
+			fmt.Printf("Error loading %s\n", name)
+		} else {
+			p.summarize(q)
+			items[i] = *p
+		}
+	}
+	fn := func(a, b Page) int {
+		if a.Score < b.Score {
+			return 1
+		} else if a.Score > b.Score {
+			return -1
+		} else if a.Title < b.Title {
+			return -1
+		} else if a.Title > b.Title {
+			return 1
+		} else {
+			return 0
+		}
+	}
+	slices.SortFunc(items, fn)
+	return items
+}
--- a/search.html
+++ b/search.html
@@ -0,0 +1,28 @@
+<!DOCTYPE html>
+<html>
+  <head>
+    <meta charset="utf-8">
+    <meta name="format-detection" content="telephone=no">
+    <meta name="viewport" content="width=device-width">
+    <title>Search for {{.Query}}</title>
+    <style>
+html { max-width: 70ch; padding: 2ch; margin: auto; color: #111; background: #ffe; }
+img { max-width: 20%; }
+.result { font-size: larger }
+.score { font-size: smaller; opacity: 0.8; }
+    </style>
+  </head>
+  <body>
+    <h1>Search for {{.Query}}</h1>
+    <div>
+      {{if .Results}}
+        {{range .Items}}
+      <p><a class="result" href="/view/{{.Name}}">{{.Title}}</a> <span class="score">{{.Score}}</span></p>
+      <blockquote>{{.Html}}</blockquote>
+        {{end}}
+      {{else}}
+      <p>No results.</p>
+      {{end}}
+    </div>
+  </body>
+</html>
--- a/search_test.go
+++ b/search_test.go
@@ -0,0 +1,71 @@
+package main
+
+import (
+	"testing"
+	"strings"
+	"os"
+)
+
+var name string = "test"
+
+// TestIndex relies on README.md being indexed
+func TestIndex (t *testing.T) {
+	_ = os.Remove(name + ".md")
+	loadIndex()
+	q := "Oddµ"
+	pages := search(q)
+	if len(pages) == 0 {
+		t.Log("Search found no result")
+		t.Fail()
+	}
+	for _, p := range pages {
+		if !strings.Contains(string(p.Body), q) {
+			t.Logf("Page %s does not contain %s", p.Name, q)
+			t.Fail()
+		}
+		if p.Score == 0 {
+			t.Logf("Page %s has no score", p.Name)
+			t.Fail()
+		}
+	}
+	p := &Page{Name: name, Body: []byte("This is a test.")}
+	p.save()
+	pages = search("This is a test")
+	found := false
+	for _, p := range pages {
+		if p.Name == name {
+			found = true
+			break
+		}
+	}
+	if !found {
+		t.Logf("Page '%s' was not found", name)
+		t.Fail()
+	}
+	p = &Page{Name: name, Body: []byte("Guvf vf n grfg.")}
+	p.save()
+	pages = search("This is a test")
+	found = false
+	for _, p := range pages {
+		if p.Name == name {
+			found = true
+			break
+		}
+	}
+	if found {
+		t.Logf("Page '%s' was still found using the old content: %s", name, p.Body)
+		t.Fail()
+	}
+	pages = search("Guvf")
+	found = false
+	for _, p := range pages {
+		if p.Name == name {
+			found = true
+			break
+		}
+	}
+	if !found {
+		t.Logf("Page '%s' not found using the new content: %s", name, p.Body)
+		t.Fail()
+	}
+}
--- a/snippets.go
+++ b/snippets.go
@@ -0,0 +1,79 @@
+package main
+
+import (
+	"strings"
+	"regexp"
+)
+
+func snippets (q string, s string) (string, int) {
+	// Look for Snippets
+	snippetlen := 100
+	maxsnippets := 4
+	// Compile the query as a regular expression
+	re, err := regexp.Compile("(?i)(" + strings.Join(strings.Split(q, " "), "|") + ")")
+	// If the compilation didn't work, truncate
+	if err != nil || len(s) <= snippetlen {
+		if len(s) > 400 {
+			s = s[0:400]
+		}
+		return highlight(q, s)
+	}
+	// show a snippet from the beginning of the document
+	j := strings.LastIndex(s[:snippetlen], " ")
+	if j == -1 {
+		// OK, look for a longer word
+		j = strings.Index(s, " ")
+		if j == -1 {
+			// Or just truncate the body.
+			if len(s) > 400 {
+				s = s[0:400]
+			}
+			return highlight(q, s)
+		}
+	}
+	t := s[0:j]
+	res := t + " …"
+	s = s[j:] // avoid rematching
+	jsnippet := 0
+	for jsnippet < maxsnippets {
+		m := re.FindStringSubmatch(s)
+		if m == nil {
+			break
+		}
+		jsnippet++
+		j = strings.Index(s, m[1])
+		if j > -1 {
+			// get the substring containing the start of
+			// the match, ending on word boundaries
+			from := j - snippetlen / 2
+			if from < 0 {
+				from = 0
+			}
+			start := strings.Index(s[from:], " ")
+			if start == -1 {
+				start = 0
+			} else {
+				start += from
+			}
+			to := j + snippetlen / 2
+			if to > len(s) {
+				to = len(s)
+			}
+			end := strings.LastIndex(s[:to], " ")
+			if end == -1 {
+				// OK, look for a longer word
+				end = strings.Index(s[to:], " ")
+				if end == -1 {
+					end = len(s)
+				} else {
+					end += to
+				}
+			}
+			t = s[start : end];
+			res = res + t + " …";
+			// truncate text to avoid rematching the same string.
+			s = s[end:]
+		}
+	}
+	return highlight(q, res)
+}
--- a/snippets_test.go
+++ b/snippets_test.go
@@ -0,0 +1,27 @@
+package main
+
+import (
+	"testing"
+)
+
+func TestSnippets(t *testing.T) {
+	s := `We are immersed in a sea of dead people. All the dead that have gone before us, silent now, just staring, gaping. As we move and talk and fret, never once stopping to ask ourselves – or them! – what it was all about. Instead we drown ourselves in noise. Incessantly we babble, surrounded by false friends claiming that all is well. And look at us! Yes, we are well. Patting our backs and expecting a pat – and we do! – we smugly do enjoy.`
+
+	h := `We are immersed in a sea of dead people. <b>All</b> the dead that have gone before us, silent now, just … to ask ourselves – or them! – what it was <b>all</b> about. Instead we drown ourselves in no<b>is</b>e. … surrounded by false friends claiming that <b>all</b> <b>is</b> <b>well</b>. And look at us! Yes, we are <b>well</b>. …`
+
+	q := "all is well"
+	r, c := snippets(q, s)
+	if r != h {
+		t.Logf("The snippets are wrong in ｢%s｣", r)
+		t.Fail()
+	}
+	// Score 12:
+	// - all is well (1)
+	// - all, beginning, end, whole word (+4 × 3 = 12)
+	// - is, beginning, end, whole word (+4 × 1 = 4), and as a substring (1)
+	// - well, beginning, end, whole word (+4 × 2 = 8)
+	if c != 26 {
+		t.Logf("%s score is %d", q, c)
+		t.Fail()
+	}
+}
--- a/view.html
+++ b/view.html
@@ -7,12 +7,19 @@
    <title>{{.Title}}</title>
    <style>
 html { max-width: 70ch; padding: 2ch; margin: auto; color: #111; background: #ffe; }
+form { display: inline-block; padding-left: 1em; }
 img { max-width: 100%; }
    </style>
  </head>
  <body>
    <h1>{{.Title}}</h1>
-    <p><a href="/edit/{{.Name}}">Edit this page</a></p>
+    <div>
+      <a href="/edit/{{.Name}}">Edit this page</a>
+      <form role="search" action="/search" method="GET">
+        <input type="text" spellcheck="false" name="q" required>
+        <button>Search</button>
+      </form>
+    </div>
    <div>
 {{.Html}}
    </div>
--- a/wiki.go
+++ b/wiki.go
@@ -1,113 +1,122 @@
 package main

 import (
-	"github.com/microcosm-cc/bluemonday"
-	"github.com/gomarkdown/markdown"
 	"html/template"
 	"net/http"
 	"strings"
 	"regexp"
-	"bytes"
 	"fmt"
 	"os"
 )

-var templates = template.Must(template.ParseFiles("edit.html", "view.html"))
+// Templates are parsed at startup.
+var templates = template.Must(template.ParseFiles("edit.html", "view.html", "search.html"))

-var validPath = regexp.MustCompile("^/(edit|save|view)/(([a-z]+/)?[^/]+)$")
+// validPath is a regular expression where the second group matches a
+// page, so when the handler for "/edit/" is called, a URL path of
+// "/edit/foo" results in the editHandler being called with title
+// "foo". The regular expression doesn't define the handlers (this
+// happens in the main function).
+var validPath = regexp.MustCompile("^/([^/]+)/(.+)$")
+
+// titleRegexp is a regular expression matching a level 1 header line
+// in a Markdown document. The first group matches the actual text and
+// is used to provide an title for pages. If no title exists in the
+// document, the page name is used instead.
 var titleRegexp = regexp.MustCompile("(?m)^#\\s*(.*)\n+")

-type Page struct {
-	Title string
-	Name  string
-	Body  []byte
-	Html  template.HTML
-}
-
-func (p *Page) save() error {
-	filename := p.Name + ".md"
-	return os.WriteFile(filename, bytes.ReplaceAll(p.Body, []byte{'\r'}, []byte{}), 0600)
-}
-
-func loadPage(title string) (*Page, error) {
-	filename := title + ".md"
-	body, err := os.ReadFile(filename)
-	if err != nil {
-		return nil, err
-	}
-	return &Page{Title: title, Name: title, Body: body}, nil
-}
-
-func renderTemplate(w http.ResponseWriter, tmpl string, p *Page) {
-	err := templates.ExecuteTemplate(w, tmpl+".html", p)
+// renderTemplate is the helper that is used render the templates with
+// data.
+func renderTemplate(w http.ResponseWriter, tmpl string, data any) {
+	err := templates.ExecuteTemplate(w, tmpl+".html", data)
 	if err != nil {
 		http.Error(w, err.Error(), http.StatusInternalServerError)
 	}
 }

+// rootHandler just redirects to /view/index.
 func rootHandler(w http.ResponseWriter, r *http.Request) {
 	http.Redirect(w, r, "/view/index", http.StatusFound)
 }

-func viewHandler(w http.ResponseWriter, r *http.Request, title string) {
+// viewHandler renders a text file, if the name ends in ".txt" and
+// such a file exists. Otherwise, it loads the page. If this didn't
+// work, the browser is redirected to an edit page. Otherwise, the
+// "view.html" template is used to show the rendered HTML.
+func viewHandler(w http.ResponseWriter, r *http.Request, name string) {
 	// Short cut for text files
-	if (strings.HasSuffix(title, ".txt")) {
-		body, err := os.ReadFile(title)
+	if (strings.HasSuffix(name, ".txt")) {
+		body, err := os.ReadFile(name)
 		if err == nil {
 			w.Write(body)
 			return
 		}
 	}
 	// Attempt to load Markdown page; edit it if this fails
-	p, err := loadPage(title)
+	p, err := loadPage(name)
 	if err != nil {
-		http.Redirect(w, r, "/edit/"+title, http.StatusFound)
+		http.Redirect(w, r, "/edit/"+name, http.StatusFound)
 		return
 	}
-	// Render the Markdown to HTML, extracting a title and
-	// possibly sanitizing it
-	s := string(p.Body)
-	m := titleRegexp.FindStringSubmatch(s)
-	if m != nil {
-		p.Title = m[1]
-		p.Body = []byte(strings.Replace(s, m[0], "", 1))
-	}
-	maybeUnsafeHTML := markdown.ToHTML(p.Body, nil, nil)
-	html := bluemonday.UGCPolicy().SanitizeBytes(maybeUnsafeHTML)
-	p.Html = template.HTML(html);
+	p.handleTitle(true)
+	p.renderHtml()
 	renderTemplate(w, "view", p)
 }

-func editHandler(w http.ResponseWriter, r *http.Request, title string) {
-	p, err := loadPage(title)
+// editHandler uses the "edit.html" template to present an edit page.
+// When editing, the page title is not overriden by a title in the
+// text. Instead, the page name is used.
+func editHandler(w http.ResponseWriter, r *http.Request, name string) {
+	p, err := loadPage(name)
 	if err != nil {
-		p = &Page{Title: title, Name: title}
+		p = &Page{Title: name, Name: name}
+	} else {
+		p.handleTitle(false)
 	}
 	renderTemplate(w, "edit", p)
 }

-func saveHandler(w http.ResponseWriter, r *http.Request, title string) {
+// saveHandler takes the "body" form parameter and saves it. The
+// browser is redirected to the page view.
+func saveHandler(w http.ResponseWriter, r *http.Request, name string) {
 	body := r.FormValue("body")
-	p := &Page{Title: title, Name: title, Body: []byte(body)}
+	p := &Page{Name: name, Body: []byte(body)}
 	err := p.save()
 	if err != nil {
 		http.Error(w, err.Error(), http.StatusInternalServerError)
 		return
 	}
-	http.Redirect(w, r, "/view/"+title, http.StatusFound)
+	http.Redirect(w, r, "/view/"+name, http.StatusFound)
 }

+// makeHandler returns a handler that uses the URL path without the
+// first path element as its argument, e.g. if the URL path is
+// /edit/foo/bar, the editHandler is called with "foo/bar" as its
+// argument. This uses the second group from the validPath regular
+// expression.
 func makeHandler(fn func (http.ResponseWriter, *http.Request, string)) http.HandlerFunc {
 	return func(w http.ResponseWriter, r *http.Request) {
 		m := validPath.FindStringSubmatch(r.URL.Path)
-		if m == nil {
+		if m != nil {
+			fn(w, r, m[2])
+		} else {
 			http.NotFound(w, r)
-			return
 		}
-		fn(w, r, m[2])
 	}
 }

+// searchHandler presents a search result. It uses the query string in
+// the form parameter "q" and the template "search.html". For each
+// page found, the HTML is just an extract of the actual body.
+func searchHandler(w http.ResponseWriter, r *http.Request) {
+	q := r.FormValue("q")
+	items := search(q)
+	s := &Search{Query: q, Items: items, Results: len(items) > 0}
+	renderTemplate(w, "search", s)
+}
+
+// getPort returns the environment variable ODDMU_PORT or the default
+// port, "8080".
 func getPort() string {
 	port := os.Getenv("ODDMU_PORT")
 	if port == "" {
@@ -121,8 +130,9 @@ func main() {
 	http.HandleFunc("/view/", makeHandler(viewHandler))
 	http.HandleFunc("/edit/", makeHandler(editHandler))
 	http.HandleFunc("/save/", makeHandler(saveHandler))
-
+	http.HandleFunc("/search", searchHandler)
+	loadIndex()
 	port := getPort()
-	fmt.Println("Serving a wiki on port " + port)
+	fmt.Printf("Serving a wiki on port %s\n", port)
 	http.ListenAndServe(":" + port, nil)
 }
Author	SHA1	Message	Date
Alex Schroeder	6f61dde12a	Test search and fix bugs	2023-08-24 14:14:33 +02:00
Alex Schroeder	5b29e6433a	Add page tests	2023-08-24 13:06:02 +02:00
Alex Schroeder	2bd20432e2	More snippet and highlight testing and fixing	2023-08-24 12:42:50 +02:00
Alex Schroeder	49d62a7979	Split highlight.go from snippets.go	2023-08-24 10:33:32 +02:00
Alex Schroeder	071e807886	Add snippets test	2023-08-24 10:32:07 +02:00
Alex Schroeder	df1fdf4373	Add indexing limitation	2023-08-24 10:12:54 +02:00
Alex Schroeder	08b63ae84b	Add scoring	2023-08-24 10:00:39 +02:00
Alex Schroeder	645a87e5c8	Split page.go	2023-08-24 08:57:36 +02:00
Alex Schroeder	c54e41da28	Comments. handleTitle takes arg. handleTitle now takes an argument so that the edit page can show the page title and still have an untouched Page.Body for editing.	2023-08-24 08:51:51 +02:00
Alex Schroeder	6a4e014d1c	Split search.go and snippets.go	2023-08-24 08:29:53 +02:00
Alex Schroeder	2142144b0c	Add search	2023-08-23 23:27:34 +02:00