Skip to content

Commit

Permalink
Go back to UTF-8 strings because of Project Gutenberg license (also, …
Browse files Browse the repository at this point in the history
…just not a good idea to change them)
  • Loading branch information
juliasilge committed Apr 12, 2016
1 parent dfdde47 commit 164276c
Show file tree
Hide file tree
Showing 4 changed files with 9 additions and 11 deletions.
11 changes: 9 additions & 2 deletions cran-comments.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,16 @@ This is the first attempted CRAN release of janeaustenr, and my first submission

## R CMD check results

0 errors | 0 warnings | 0 note
0 errors | 0 warnings | 1 note

There was a message about possibly invalid URLs for the Project Gutenberg URLs in the .Rd files, and about possibly mis-spelled words in DESCRIPTION (Austen's at 2:30 and 6:34, Northanger at 8:32).
There was 1 note about non-ASCII, marked UTF-8 strings; there are 2 in the data sets.

* `mansfieldpark[14652]` has a British pound symbol.
* `persuasion[7066]` has an e with an accent grave (in the word "arrangè")

I believe it would violate Project Gutenberg's license to change these in the texts, so I would like to keep them as is.

Also, there was a message about possibly invalid URLs for the Project Gutenberg URLs in the .Rd files, and about possibly mis-spelled words in DESCRIPTION (Austen's at 2:30 and 6:34, Northanger at 8:32).

* Project Gutenberg blocks automated traffic, which caused the issue with the possibly invalid URLs.
* Those words are spelled correctly.
9 changes: 0 additions & 9 deletions data-raw/prep_data.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,9 @@ prideprejudice <- read_lines("http://www.gutenberg.org/cache/epub/1342/pg1342.tx
prideprejudice <- prideprejudice[1:(length(prideprejudice) - 366)]
prideprejudice <- prideprejudice[!is.na(prideprejudice)]

## Mansfield Park has one line with a non-ASCII character (a British pound
## symbol); let's edit it for CRAN

mansfieldpark <- read_lines("http://www.gutenberg.org/cache/epub/141/pg141.txt", skip = 29)
mansfieldpark <- mansfieldpark[1:(length(mansfieldpark) - 367)]
mansfieldpark <- mansfieldpark[!is.na(mansfieldpark)]
mansfieldpark[14652] <- "the command of her beauty, and her 20,000 pounds, any one who could satisfy the"

emma <- read_lines("http://www.gutenberg.org/cache/epub/158/pg158.txt", skip = 29)
emma <- emma[1:(length(emma) - 367)]
Expand All @@ -35,14 +31,9 @@ northangerabbey <- read_lines("http://www.gutenberg.org/cache/epub/121/pg121.txt
northangerabbey <- northangerabbey[1:(length(northangerabbey) - 383)]
northangerabbey <- northangerabbey[!is.na(northangerabbey)]

## Persuasion also has a line with a non-ASCII character (e with an accent);
## let's edit it for CRAN

persuasion <- read_lines("http://www.gutenberg.org/cache/epub/105/pg105.txt", skip = 35)
persuasion <- persuasion[1:(length(persuasion) - 371)]
persuasion <- persuasion[!is.na(persuasion)]
persuasion[7066] <- "concert. Something so formal and _arrange_ in her air! and she sits so"


## Now, add the data files to the package

Expand Down
Binary file modified data/mansfieldpark.rda
Binary file not shown.
Binary file modified data/persuasion.rda
Binary file not shown.

0 comments on commit 164276c

Please sign in to comment.