These tools will no longer be maintained as of December 31, 2024. Archived website can be found here. PubMed4Hh GitHub repository can be found here. Contact NLM Customer Service if you have questions.
Pubmed for Handhelds
PUBMED FOR HANDHELDS
Search MEDLINE/PubMed
Title: An implementation of the trigram phrase matching method for text similarity problems. Author: Tardelli AO, Anção MS, Packer AL, Sigulem D. Journal: Stud Health Technol Inform; 2004; 103():43-9. PubMed ID: 15747904. Abstract: The representation of texts by term vectors with element values calculated by a TFIDF method yields to significant results in text similarity problems, such as finding related documents in bibliographic or full-text databases and identifying MeSH concepts from medical texts by lexical approach and also harmonizing journal citation in ISI/SciELO references and normalizing author's affiliation in MEDLINE. Our work considered "trigrams" as the terms (elements) of a term vector representing a text, according to the Trigram Phrase Matching published by the NLM's Indexing Initiative and its logarithmic Term Frequency-Inverse Document Frequency method for term weighting. Trigrams are overlapping 3-char strings from a text, extracted by a couple of rules, and a trigram matching method may improve the probability of identifying synonym phrases or similar texts. The matching process was implemented as a simple algorithm, and requires a certain amount of computer resources. An efficiency-focused C-programming was adopted. In addition, some heuristic rules improved the efficiency of the method and made it feasible a regular "find your scientific production in SciELO collection" information service. We describe an implementation of the Trigram Matching method, the software tool we developed and a set of experimental parameters for the above results.[Abstract] [Full Text] [Related] [New Search]