[Biodevelopers] PubMed and UTF-8

Martin Kucej Martin.Kucej at UTSouthwestern.edu
Mon Mar 10 16:16:57 EDT 2008


Hi:

I wonder if someone here knows how PubMed identifies UTF-8 accented results from ASCII search queries. This is what I found:

"All PubMed searching for terms containing diacritical marks ignores those marks, even if users enter them in a search query box (by cutting and pasting or by direct entry). Therefore, searches that include diacritics will retrieve results for terms that include the diacritic as well as terms that do not."

This part is easy, accented query is translated into plain text. But:

"If you search with plain letters, your retrieval will include results for terms with the diacritic as well as those without. In other words, search results are diacritics-neutral."

Now, how does this work? Does PubMed keep ASCII copies of UTF-8 records in the database?

Martin



More information about the Biodevelopers mailing list