Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obtaining sectioned article text #68

Open
Shadonar opened this issue Dec 3, 2014 · 1 comment
Open

Obtaining sectioned article text #68

Shadonar opened this issue Dec 3, 2014 · 1 comment

Comments

@Shadonar
Copy link

Shadonar commented Dec 3, 2014

Hello,

I have a project that I was looking to use Scrapely for. From what I've read and found out this sounds like it's something that I would like to use. I have run into a problem with it though. when I pass a url that contains sectioned article text (which appears to be almost all of my urls) I only receive the first section of the text.

Here's a site that I tried: http://www.autostraddle.com/12-black-friday-deals-you-can-get-without-having-to-put-pants-on-266850/

and here's what I used to train scrapely:

{'title':'15 Things You Learn When You Move In With Your Girlfriend', 'author': 'by Kate', 'postdate':'November 10, 2014 at 9:00am PST', 'count':'82', 'content':'There comes a point in every relationship when it makes sense for you to think about cohabitation.'}

if I then have scrapely scrape that same url it only gives me that first paragraph.

So my question is, how would I get scrapely to obtain all of the articles main text (basically the text between the social media icons).

Any help would be greatly appreciated!

Thanks

@kalessin
Copy link
Contributor

kalessin commented Dec 4, 2014

Hi Shanodar,

try to add as 'content' value, a list containing two elements: the content of the first paragraph, and the content of the last one. So you will train the algorithm to perform an iterated extraction over all them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants