xapian-core  1.5.1
Xapian::Utf8Iterator Class Reference

An iterator which returns Unicode character values from a UTF-8 encoded string. More...

#include <unicode.h>

Public Types

typedef std::input_iterator_tag iterator_category
 We implement the semantics of an STL input_iterator.
typedef unsigned value_type
typedef size_t difference_type
typedef value_type * pointer
typedef value_type reference

Public Member Functions

const char * raw () const
 Return the raw const char* pointer for the current position.
size_t left () const
 Return the number of bytes left in the iterator's buffer.
void assign (const char *p_, size_t len)
 Assign a new string to the iterator.
void assign (std::string_view s)
 Assign a new string to the iterator.
 Utf8Iterator (const char *p_, size_t len)
 Create an iterator given a pointer and a length.
 Utf8Iterator (std::string_view s)
 Create an iterator given a string.
 Utf8Iterator () noexcept
 Create an iterator which is at the end of its iteration.
unsigned operator* () const noexcept
 Get the current Unicode character value pointed to by the iterator.
Utf8Iterator operator++ (int)
 Move forward to the next Unicode character.
Utf8Iterator & operator++ ()
 Move forward to the next Unicode character.
bool operator== (const Utf8Iterator &other) const noexcept
 Test two Utf8Iterators for equality.
bool operator!= (const Utf8Iterator &other) const noexcept
 Test two Utf8Iterators for inequality.

Detailed Description

An iterator which returns Unicode character values from a UTF-8 encoded string.

Constructor & Destructor Documentation

◆ Utf8Iterator() [1/3]

Xapian::Utf8Iterator::Utf8Iterator ( const char * p_,
size_t len )
inline

Create an iterator given a pointer and a length.

The iterator will return characters from the start of the string when next called. The string is not copied into the iterator, so it must remain valid while the iteration is in progress.

Parameters
p_A pointer to the start of the string to read.
lenThe length of the string to read.

References assign().

◆ Utf8Iterator() [2/3]

Xapian::Utf8Iterator::Utf8Iterator ( std::string_view s)
inlineexplicit

Create an iterator given a string.

The iterator will return characters from the start of the string when next called. The string is not copied into the iterator, so it must remain valid while the iteration is in progress.

Parameters
sThe string to read. Must not be modified while the iteration is in progress.

This parameter is of type std::string_view, so you can pass in types which automatically convert to that such as std::string, or a const char* pointing to a nul-terminated string.

References assign().

◆ Utf8Iterator() [3/3]

Xapian::Utf8Iterator::Utf8Iterator ( )
inlinenoexcept

Create an iterator which is at the end of its iteration.

This can be compared to another iterator to check if the other iterator has reached its end.

Member Function Documentation

◆ assign() [1/2]

void Xapian::Utf8Iterator::assign ( const char * p_,
size_t len )
inline

Assign a new string to the iterator.

The iterator will forget the string it was iterating through, and return characters from the start of the new string when next called. The string is not copied into the iterator, so it must remain valid while the iteration is in progress.

Parameters
p_A pointer to the start of the string to read.
lenThe length of the string to read.

Referenced by Utf8Iterator(), and Utf8Iterator().

◆ assign() [2/2]

void Xapian::Utf8Iterator::assign ( std::string_view s)
inline

Assign a new string to the iterator.

The iterator will forget the string it was iterating through, and return characters from the start of the new string when next called. The string is not copied into the iterator, so it must remain valid while the iteration is in progress.

Parameters
sThe string to read. Must not be modified while the iteration is in progress.

References assign().

Referenced by assign().

◆ operator!=()

bool Xapian::Utf8Iterator::operator!= ( const Utf8Iterator & other) const
inlinenoexcept

Test two Utf8Iterators for inequality.

Parameters
otherThe Utf8Iterator to compare this one with.
Returns
true iff the iterators do not point to the same position.

◆ operator*()

unsigned Xapian::Utf8Iterator::operator* ( ) const
noexcept

Get the current Unicode character value pointed to by the iterator.

If an invalid UTF-8 sequence is encountered, then the byte values comprising it are returned until valid UTF-8 or the end of the input is reached.

This handling applies to invalid byte sequences, truncated UTF-8 sequences, overlong sequences and (since Xapian 2.0.0) surrogate pair codepoints encoded as UTF-8.

If you want to reject or otherwise discriminate invalid UTF-8 sequences then see the strict_deref() method.

Returns unsigned(-1) if the iterator has reached the end of its buffer.

◆ operator++() [1/2]

Utf8Iterator & Xapian::Utf8Iterator::operator++ ( )
inline

Move forward to the next Unicode character.

Returns
A reference to this object.

◆ operator++() [2/2]

Utf8Iterator Xapian::Utf8Iterator::operator++ ( int )
inline

Move forward to the next Unicode character.

Returns
An iterator pointing to the position before the move.

◆ operator==()

bool Xapian::Utf8Iterator::operator== ( const Utf8Iterator & other) const
inlinenoexcept

Test two Utf8Iterators for equality.

Parameters
otherThe Utf8Iterator to compare this one with.
Returns
true iff the iterators point to the same position.

The documentation for this class was generated from the following file: