- Other Frequently Asked Questions
This is the page for "other" uncategorised FAQs. When some problems of a new area appear, they are likely to arrive here as first stop. When there are enough number of Q/As under certain topic, they will be diverted to the page dedicated to that area.
The whole catalogue is the first entry point for usual readers. It can be found here: FAQ Catalogue
please see here
please see here
If you are using Jupyter for the first time, you should create virtual environment first by pyvenv venv
. Then enter the venv
environment by typing source venv/bin/activate
. This is the right place where you should install all the modules you will use, after that you can enter the Jupyter by command jupyter notebook
. Once you finish your work, use deactivate
to exit venv
environment. Do remember, you just need to create virtual environment once, next time when you use Jupyter, just typesource venv/bin/activate
+ jupyter notebook
will work.
- Exit Python interactive mode:
quit()
,exit()
orControl-D
on Mac,Control-Z
on Windows. - Auto supplementation when executing the code files: type the first letter of the file name then
tab
- Indent:
tab
- Indent back:
tab
+shift
- Comment single line:
#
ahead of the line - Comment multiple lines: select multiple lines, then type
command
+/
, type again to un-comment - Force quit boxes -
command
+option
+esc
In the coding process, we may keep renaming the variables or adjust the sequences of the cells. In such circumstances, there might be errors arisen. Therefore, a good practice here is always try restarting kernels
if encounter the weird errors.
If you download the csv from some where, and you want to import it in Jupyter notebook. you should put the csv file in the folder where your venv folder are first. Usually, it's in the user path. All the files you write and read should in this folder. Another method is to type pwd
in Jupyter, it will return the path, where you should put the file in.
CSV Sample output:
['U+FEFF/name', 'id', 'gender', 'location', 'phone']
['Chico', '1742', 'M', 'KLN', '3344']
['Ri', '1743', 'F', 'LOS', '5168']
['Ivy', '1655', 'F', 'MK', '7323']
In some cases, the csv you read may have some encoding issue like the above.U+FEFF
is the byte order mark, or BOM, and is used to tell the difference between big- and little-endian UTF-16 encoding. To omit BOM, just add a encoding line in the with....open
command, as with open('chapter4-example-name_list.csv',encoding='utf-8-sig') as f:
. Further reading about this issue.
For more explanation, please refer to the documentation on the open function
This error is usually raised as the following:
UTF-8 cannot decode byte 0xff in postion x: invalid byte
Solution:
First, try to copy and paste the contains/data to Google drive with a new sheet, and downloaded it as a new csv.
Then, we can successfully read the data by the usual way.
# -*- coding: utf-8 -*-
import pandas as pd
df = pd.read_csv('REITs.csv', 'rb')
For example, using os.listdir
to import a list of files.
def read_txt(path): #read files and get content
all_text = []
for file in os.listdir(path):
f=open(file,"r")
contents= f.read()
all_text.append(contents)
all_words = "".join(all_text)
for ch in '\s+\.\!\/_,$%^*(+\"\')]+|[+——()?:【】“”‘’':
words = all_words.replace(ch," ")
return words
words = read_txt("text/") #pass your own file path that include list of .txt
Error: 'utf-8' codec can't decode byte 0x80 in position 3131
Even though your files are encodes by utf-8
, somehow it may still rising error above, you can using following solution in the terminal:
cd to desktop/chicoxyc/text #cd to the folder where your txt files are
ls -a #list all the files
rm .DS_Store #delete .DS_Store if there is any
It's a common error of JSONDecodeError
meaning that there is no JSON can be parsed. You need to check the response object before doing further processing.
For some specify example solutions, you can refer to here .
NOTE: This is not necessarily caused by encoding problem. sometimes malformed JSON format will also cause the problem. Network problem may also lead to this type of error. The response object is not a valid JOSN. It may be some error code.
We can use list slicing, if...else or try...except to test the boundary condition and separate different elements we want.
example: following is the content of series df['time_countries']
, and we want to get country and time separately.
上映时间:1993-01-01
上映时间:1994-10-14(美国)
上映时间:1953-09-02(美国)
上映时间:1994-09-14(法国)
上映时间:1972-03-24(美国)
上映时间:1998-04-03
上映时间:1993-07-01(中国香港)
上映时间:2001-07-20(日本)
上映时间:1940-05-17(美国)
上映时间:1939-12-15(美国)
You might notice that some entries have no country, therefore we have to handle two different situations. if...else
here can helps. We can see that countries starts from index 15 in the string, therefore, we can use 15 to set the condition.
def get_country(x):
#if length > 15, we first separate by (, get the second parts and then separate ), get the first part, which is pure countries name we want.
#if length < 15, there is no countries, we return blank
if len (x) > 15:
return x.split('(')[1].split(')')[0]
else:
return''
df['country'] = df['time_countries'].apply(get_country)
def get_time(x):
#every entries have time, therefore we dont need set condition here. We first separate by :, get the second parts and then separate (, get the first part, which is pure time we want.
return x.split(':')[1].split('(')[0]
df['time'] = df['time_countries'].apply(get_time)