xapian-core  1.5.1
unicode.h File Reference

Unicode and UTF-8 related classes and functions. More...

Classes

class  Xapian::Utf8Iterator
 An iterator which returns Unicode character values from a UTF-8 encoded string. More...

Namespaces

namespace  Xapian
 The Xapian namespace contains public interfaces for the Xapian library.
namespace  Xapian::Unicode
 Functions associated with handling Unicode characters.

Enumerations

enum  Xapian::Unicode::category {
  Xapian::Unicode::UNASSIGNED , Xapian::Unicode::UPPERCASE_LETTER , Xapian::Unicode::LOWERCASE_LETTER , Xapian::Unicode::TITLECASE_LETTER ,
  Xapian::Unicode::MODIFIER_LETTER , Xapian::Unicode::OTHER_LETTER , Xapian::Unicode::NON_SPACING_MARK , Xapian::Unicode::ENCLOSING_MARK ,
  Xapian::Unicode::COMBINING_SPACING_MARK , Xapian::Unicode::DECIMAL_DIGIT_NUMBER , Xapian::Unicode::LETTER_NUMBER , Xapian::Unicode::OTHER_NUMBER ,
  Xapian::Unicode::SPACE_SEPARATOR , Xapian::Unicode::LINE_SEPARATOR , Xapian::Unicode::PARAGRAPH_SEPARATOR , Xapian::Unicode::CONTROL ,
  Xapian::Unicode::FORMAT , Xapian::Unicode::PRIVATE_USE , Xapian::Unicode::SURROGATE , Xapian::Unicode::CONNECTOR_PUNCTUATION ,
  Xapian::Unicode::DASH_PUNCTUATION , Xapian::Unicode::OPEN_PUNCTUATION , Xapian::Unicode::CLOSE_PUNCTUATION , Xapian::Unicode::INITIAL_QUOTE_PUNCTUATION ,
  Xapian::Unicode::FINAL_QUOTE_PUNCTUATION , Xapian::Unicode::OTHER_PUNCTUATION , Xapian::Unicode::MATH_SYMBOL , Xapian::Unicode::CURRENCY_SYMBOL ,
  Xapian::Unicode::MODIFIER_SYMBOL , Xapian::Unicode::OTHER_SYMBOL
}
 Each Unicode character is in exactly one of these categories. More...

Functions

unsigned Xapian::Unicode::nonascii_to_utf8 (unsigned ch, char *buf)
 Convert a single non-ASCII Unicode character to UTF-8.
unsigned Xapian::Unicode::to_utf8 (unsigned ch, char *buf)
 Convert a single Unicode character to UTF-8.
void Xapian::Unicode::append_utf8 (std::string &s, unsigned ch)
 Append the UTF-8 representation of a single Unicode character to a std::string.
category Xapian::Unicode::get_category (unsigned ch)
 Return the category which a given Unicode character falls into.
bool Xapian::Unicode::is_wordchar (unsigned ch)
 Test if a given Unicode character is "word character".
bool Xapian::Unicode::is_whitespace (unsigned ch)
 Test if a given Unicode character is a whitespace character.
bool Xapian::Unicode::is_currency (unsigned ch)
 Test if a given Unicode character is a currency symbol.
unsigned Xapian::Unicode::tolower (unsigned ch)
 Convert a Unicode character to lowercase.
unsigned Xapian::Unicode::toupper (unsigned ch)
 Convert a Unicode character to uppercase.
std::string Xapian::Unicode::tolower (std::string_view term)
 Convert a UTF-8 string to lowercase.
std::string Xapian::Unicode::toupper (std::string_view term)
 Convert a UTF-8 string to uppercase.

Detailed Description

Unicode and UTF-8 related classes and functions.