Port all the text researches to tree researches #2078

atimmer · 2018-12-20T12:39:50Z

The following list of researches exist now, I will put after the research what needs to be done for them.

In general I want to:

Drop all the verbs from the research. A research is not something you do, it is an objective fact about the text.
Change count research to be plain research. The count can easily be retrieved by doing .length whenever the count is necessary. And the actual results are far more useful than the count.

List of current research

urlLength, unchanged
wordCountInText, change to words research. Count can easily be retrieved by doing .length in the assessment.
findKeywordInPageTitle, rename to keywordInPageTitle.
calculateFleschReading, rename to fleschReadingEase, change it so it uses the new syllableCount research.
getLinkStatistics, drop in favor of links research
getLinks, rename to links, change so it returns a completely object instead of only an URL.
linkCount, drop in favor of links research
imageCount, change to images research. Count can easily be retrieved by doing .length in the assessment. The research should return image objects with all the data for an image so it is a lot more useful
altTagCount, drop in favor of images research. Logic for counting how many images with/without tags should be in the assessment.
matchKeywordInSubheadings, drop in favor of a combination of the headings and keywords research. The assessment can first get references to all the headings in the text and then ask if the keyword has been matched in these headings by calling getResearchForNode( 'keywords', heading ).
keywordCount, change to keywords research. Count can easily be retrieved by doing .length in the assessment. The research should return keyword objects with keyword, startOffset, endOffset, isExactMatch properties.
getKeywordDensity, rename to keywordDensity, change to use the keywords and words research.
stopWordsInKeyword, unchanged
stopWordsInUrl, unchanged
metaDescriptionLength, unchanged
keyphraseLength, unchanged
keywordCountInUrl, unchanged
findKeywordInFirstParagraph, drop in favor of a combination of the paragraphs and keywords research. The assessment can first get a reference to the first paragraph and the ask if the keyword has been matched in this paragraph by calling getResearchForNode( 'keywords', firstParagraph ).
metaDescriptionKeyword, unchanged
pageTitleWidth, unchanged
getWordComplexity, drop in favor of the words research, a word object should include the amount of syllables in the word
getParagraphLength, **drop in favor of a combination of the paragraphs and words research. An assessment can get all the paragraphs with the paragraph research and then retrieve the words for this specific paragraph by calling getResearchForNode( 'words', paragraph )
countSentencesFromText, change to sentences research.
countSentencesFromDescription, rename to descriptionSentences
getSubheadingTextLengths, drop in favor of a combination of the headings and words research
findTransitionWords, rename to `transitionWords
passiveVoice, rename research file to passiveVoice
getSentenceBeginnings, we should probably not change this until we implement the linguistic tree
relevantWords, adapt for the tree, but this is going to require a very specifically tuned recursion strategy. We can also choose to refactor this later and only make this work on the root node for now.
readingTime, unchanged, but adapt for the tree
getTopicDensity, drop in favor of a combination of matchedTopics and words
topicCount, rename to matchedTopics, should return an object with properties matchedTopic, startOffset, endOffset.
sentences, unchanged, but works for the tree.
keyphraseDistribution, refactor to use the new words/keywords/matchedTopics research
morphology/buildKeywordsForms, rename to keywordForms
functionWordsInKeyphrase, unchanged
h1s, drop in favor of the headings research. That one can then easily be filtered based on the level (in this case 1)

As a result of the above changes we also need to introduce some new research:

words, returns all the words for a specific node. Every word has a
keywords, returns all the keywords for a specific node.
subheadings, returns a reference to all the Heading nodes in the tree.
paragraph, returns a reference to all the Paragraph nodes in the tree.

The text was updated successfully, but these errors were encountered:

igorschoester · 2019-02-13T09:14:32Z

I checked the current issues with the label requires structured data / html parser to match them to these researches. The goal is to make it easier to double check if the issues are indeed fixed after/while creating the tree researches.

This is a work in progress, at this time there are 6 issues left to match.

To match still

getSentenceBeginnings

Ignore lists for this research https://github.com/Yoast/YoastSEO.js/issues/1190

headings/subheadings

Match paragraphs between headings Create a regex that selects all text after a subheading #463
Mark the subheading (text included) currently disabled Eye Marker Active But Nothing Is Highlighted #1151
Mark headings with attributes properly H1's in the 'single title' assessment are not marked when it has html-attributes. #1998
Do not mark the first H1 when it is the first content (probably a same text as the second H1 issue that should be resolved after implementing indices) H1 with the same text is highlighted in beginning of post #2034

getParagraphLength

Mark the paragraph (text included) Eye Marker Active But Nothing Is Highlighted #1151

getSubheadingTextLengths

Re-enable marking and check (the connected assessment has a unclear name SubheadingsDistributionTooLong; distribution can not be too long) Purple view eye icon isn't highlighting the subheading followed by more than the recommended maximum of 300 words. #1195

getKeywordDensity

Not all occurrences are marked + possibly mark sentences instead of the words to be less confusing (see the comment) Keyphrase density marking mismatch between analysis and highlights #2029

keyphraseDistribution

Mark sentence/paragraph with links (text included) Eye marker broken when sentence/paragraph contains a link #1907
Can get picked up after marking keywords works again Improve marking for keyword distribution assessment #1498

links

Mark links currently disabled Eye marker 'You're linking to another page with the focus keyword' not working #1155

images

Check correctly counting cat's as keyphrase when used in the image alt tag Apostrophes fail to be recognized as alt-text in images #1209

atimmer · 2019-03-06T11:18:42Z

List with priority, based on a deliberation with @moorscode.

Sub Headings Keyword

Match paragraphs between headings Create a regex that selects all text after a subheading #463
Mark the subheading (text included) currently disabled Eye Marker Active But Nothing Is Highlighted #1151
Mark headings with attributes properly H1's in the 'single title' assessment are not marked when it has html-attributes. #1998
Do not mark the first H1 when it is the first content (probably a same text as the second H1 issue that should be resolved after implementing indices) H1 with the same text is highlighted in beginning of post #2034
H1 with the same text is highlighted in beginning of post wordpress-seo#11660
Keyword is highlighted more times than stated in analysis #2155

Internal Links

Mark links currently disabled Eye marker 'You're linking to another page with the focus keyword' not working #1155

Text Competing Links ( currently disabled)

Mark links currently disabled Eye marker 'You're linking to another page with the focus keyword' not working #1155

Keyphrase Distribution

Mark sentence/paragraph with links (text included) Eye marker broken when sentence/paragraph contains a link #1907
Can get picked up after marking keywords works again Improve marking for keyword distribution assessment #1498

Keyword Density

Not all occurrences are marked + possibly mark sentences instead of the words to be less confusing (see the comment) Keyphrase density marking mismatch between analysis and highlights #2029
Keyword is highlighted more times than stated in analysis #2155

sentence Beginnings

Ignore lists for this research #1190
Mark sentences only at relevant places #785
paragraph Too Long
Mark the paragraph (text included) Eye Marker Active But Nothing Is Highlighted #1151

subheading Distribution Too Long (currently disabled)

Re-enable marking and check (the connected assessment has a unclear name SubheadingsDistributionTooLong; distribution can not be too long) Purple view eye icon isn't highlighting the subheading followed by more than the recommended maximum of 300 words. #1195

Text Images

Check correctly counting cat's as keyphrase when used in the image alt tag Apostrophes fail to be recognized as alt-text in images #1209

Outbound Links
passive Voice
sentence Length In Text
transition Words
Keyphrase Length
Single H1

Geen marking

Introduction Keyword
keyword Stop Words
Function Words In Keyphrase
text Presence
sentence Length In Description
flesch Reading Ease
Meta Description Keyword
Meta Description Length
Page Title Width
taxonomy Text Length
Url Keyword
Url Length
url Stop Words
Text Length
Title Keyword

Assessment disabled

word Complexity (assessment disabled)

manuelaugustin · 2019-03-19T13:15:22Z

Other issues that should be solved after the implementation of the tree (not specific to a given assessment, but possibly researches):

Exclude tables Table elements should be taken into account when getting sentences #820
If Keyword contains an ending punctuation like period exclamation point or question mark do not count it as a sentence ending [Enhancement] If Keyword contains an ending punctuation like period exclamation point or question mark do not count it as a sentence ending #2218
If Keyword contains an ending punctuation like period exclamation point or question mark be able to find it for Keyword Density [Enhancement] If Keyword contains an ending punctuation like period exclamation point or question mark be able to find it for Keyword Density wordpress-seo#13726
SentenceTokenizer incorrectly processes punctuation marks within words Add locale check for fleschreading #402

omarreiss · 2020-08-25T13:47:53Z

Closing all parse tree issues.

atimmer added enhancement owner: business labels Dec 20, 2018

atimmer added this to the StructuredTree milestone Dec 20, 2018

atimmer added the innovation Innovative issue. Relating to performance, memory or data-flow. label Dec 21, 2018

atimmer added the backlog label Feb 1, 2019

manuelaugustin added the component: parse tree label Feb 19, 2019

This was referenced Mar 8, 2019

Port all the text assessments to tree assessments #2192

Closed

Create keyphrases tree research #2193

Closed

Create headings tree research #2194

Closed

omarreiss closed this as completed Aug 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port all the text researches to tree researches #2078

Port all the text researches to tree researches #2078

atimmer commented Dec 20, 2018

igorschoester commented Feb 13, 2019

atimmer commented Mar 6, 2019 •

edited by manuelaugustin

Loading

manuelaugustin commented Mar 19, 2019 •

edited by nataliashitova

Loading

omarreiss commented Aug 25, 2020

Port all the text researches to tree researches #2078

Port all the text researches to tree researches #2078

Comments

atimmer commented Dec 20, 2018

List of current research

igorschoester commented Feb 13, 2019

atimmer commented Mar 6, 2019 • edited by manuelaugustin Loading

Geen marking

Assessment disabled

manuelaugustin commented Mar 19, 2019 • edited by nataliashitova Loading

omarreiss commented Aug 25, 2020

atimmer commented Mar 6, 2019 •

edited by manuelaugustin

Loading

manuelaugustin commented Mar 19, 2019 •

edited by nataliashitova

Loading