Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetching data from Facebook #20

Open
TWithers opened this issue Jul 18, 2018 · 0 comments
Open

Fetching data from Facebook #20

TWithers opened this issue Jul 18, 2018 · 0 comments

Comments

@TWithers
Copy link

TWithers commented Jul 18, 2018

Awesome library! I like the simplicity of it and the ability to modify it with my own parsers and what not.

I noticed that the default parser doesn't fetch data from facebook due to the meta tags using "name" instead of "content"

A quick fix:

protected function parseHtml($html)
{
    $data = [
        'image' => '',
        'title' => '',
        'description' => ''
    ];

    libxml_use_internal_errors(true);
    $doc = new \DOMDocument();
    $doc->loadHTML($html);

    /** @var \DOMElement $meta */
    foreach ($doc->getElementsByTagName('meta') as $meta) {
        if($meta->hasAttribute('name')){
            $prop = 'name';
        }else{
            $prop = 'property';
        }
        if ($meta->getAttribute($prop) === 'image') {
            $data['image'] = $meta->getAttribute('content');
        } elseif ($meta->getAttribute($prop) === 'og:image') {
            $data['image'] = $meta->getAttribute('content');
        } elseif ($meta->getAttribute($prop) === 'twitter:image') {
            $data['image'] = $meta->getAttribute('value');
        }

        if ($meta->getAttribute($prop) === 'name') {
            $data['title'] = $meta->getAttribute('content');
        } elseif ($meta->getAttribute($prop) === 'og:title') {
            $data['title'] = $meta->getAttribute('content');
        } elseif ($meta->getAttribute($prop) === 'twitter:title') {
            $data['title'] = $meta->getAttribute('value');
        }

        if ($meta->getAttribute($prop) === 'description') {
            $data['description'] = $meta->getAttribute('content');
        }else if ($meta->getAttribute($prop) === 'og:description') {
            $data['description'] = $meta->getAttribute('content');
        }
    }

    if (empty($data['title'])) {
        /** @var \DOMElement $title */
        foreach ($doc->getElementsByTagName('title') as $title) {
            $data['title'] = $title->nodeValue;
        }
    }

    return $data;
}

I double checked this against this Go script which does the same thing: https://github.com/badoux/goscraper/blob/master/goscraper.go

This also fixes an issue where the title is overwritten by the description because of a typo, and disables the need to loop through the meta tags again to find a description if the meta tag attribute is 'name' instead of 'property'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant