Python

Python是一種廣泛使用的直譯式、進階程式、通用型程式語言,由吉多·范羅蘇姆創造,第一版釋出於1991年。Python是ABC語言的後繼者,也可以視之為一種使用傳統中綴表達式的LISP方言。Python的設計哲學強調代碼的可讀性和簡潔的語法。相比於C++或Java,Python讓開發者能夠用更少的代碼表達想法。

Learning

Online Interpreter
Online Handbooks
Online Tutorials
Python examples
Web scraping (網頁爬取)
Binance Public API Connector Python
Developers Forum
VS Code



pip

Installation

Tutorials

NOTE: The following commands still require internet connection.

get-pip.py
# Latest version of python
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py

# For python 2.7.x
curl https://bootstrap.pypa.io/2.7/get-pip.py -o get-pip.py

# Offline Install the pip
sudo pyhon get-pip.py

# Install pip
python3 -m pip install pip

Update the pip

pip install --upgrade pip

python3 -m pip install --upgrade pip

 

Module install

# Downloading the source files required for the module mkdocs, which requires an internet.
pip download -d <output-dir> mkdocs

# Offline install the module mkdocs
pip install <output-dir>/*.whl

Proxy server

pip install --proxy http://<usr_name>:<password>@<proxyserver_name>:<port#> <pkg_name> 
pip config set global.proxy http://account:password@xxx.com.tw:8080
pip config set global.trusted-host pypi.python.org\npypi.org\nfiles.pythonhosted.org

Command

List installed modules

sudo pip list

Upgrade module

sudo pip install --upgrade <MODULENAME>

Export the list of installed modules

pip freeze > requirements.txt

Install modules in requirements.txt

pip install -r requirements.txt

Q & A

ERROR: Could not find a version that satisfies the requirement XXXX (from versions: none)

執行 pip install XXXX 時發生上述錯誤。

Solution:

改成這個指令:python -m pip install XXXX

Examples

maxmind_db_ip_geolocator.py

Original Post: Python Basics for Hackers, Part 4: How to Find the Exact Location of any IP Address  

#! /usr/bin/python

#Hello fellow hackers! My name is Defalt
#I built a very basic version of this tool a long time ago and recently did a re-write
#The first re-write had some awkward usage of the argparse module, so this update is going to fix it
#Original version: http://pastebin.com/J5NLnThL
#This will query the MaxMind database to get an approximate geolocation of an IP address
#Happy hacking! -Defalt

import sys
import socket
import urllib
import gzip
import os
try:
	import pygeoip
except ImportError:
	print '[!] Failed to Import pygeoip'
	try:
		choice = raw_input('[*] Attempt to Auto-install pygeoip? [y/N] ')
	except KeyboardInterrupt:
		print '\n[!] User Interrupted Choice'
		sys.exit(1)
	if choice.strip().lower()[0] == 'y':
		print '[*] Attempting to Install pygeoip... ',
		sys.stdout.flush()
		try:
			import pip
			pip.main(['install', '-q', 'pygeoip'])
			import pygeoip
			print '[DONE]'
		except Exception:
			print '[FAIL]'
			sys.exit(1)
	elif choice.strip().lower()[0] == 'n':
		print '[*] User Denied Auto-install'
		sys.exit(1)
	else:
		print '[!] Invalid Decision'
		sys.exit(1)

class Locator(object):
	def __init__(self, url=False, ip=False, datfile=False):
		self.url = url
		self.ip = ip
		self.datfile = datfile
		self.target = ''
	def check_database(self):
		if not self.datfile:
			self.datfile = '/usr/share/GeoIP/GeoLiteCity.dat'
		else:
			if not os.path.isfile(self.datfile):
				print '[!] Failed to Detect Specified Database'
				sys.exit(1)
			else:
				return
		if not os.path.isfile(self.datfile):
			print '[!] Default Database Detection Failed'
			try:
				choice = raw_input('[*] Attempt to Auto-install Database? [y/N] ')
			except KeyboardInterrupt:
				print '\n[!] User Interrupted Choice'
				sys.exit(1)
			if choice.strip().lower()[0] == 'y':
				print '[*] Attempting to Auto-install Database... ',
				sys.stdout.flush()
				if not os.path.isdir('/usr/share/GeoIP'):
					os.makedirs('/usr/share/GeoIP')
				try:
					urllib.urlretrieve('http://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz', '/usr/share/GeoIP/GeoLiteCity.dat.gz')
				except Exception:
					print '[FAIL]'
					print '[!] Failed to Download Database'
					sys.exit(1)
				try:
					with gzip.open('/usr/share/GeoIP/GeoLiteCity.dat.gz', 'rb') as compressed_dat:
						with open('/usr/share/GeoIP/GeoLiteCity.dat', 'wb') as new_dat:
							new_dat.write(compressed_dat.read())
				except IOError:
					print '[FAIL]'
					print '[!] Failed to Decompress Database'
					sys.exit(1)
				os.remove('/usr/share/GeoIP/GeoLiteCity.dat.gz')
				print '[DONE]\n'
			elif choice.strip().lower()[0] == 'n':
				print '[!] User Denied Auto-Install'
				sys.exit(1)
			else:
				print '[!] Invalid Choice'
				sys.exit(1)
	def query(self):
		if not not self.url:
			print '[*] Translating %s: ' %(self.url),
			sys.stdout.flush()
			try:
				self.target += socket.gethostbyname(self.url)
				print self.target
			except Exception:
				print '\n[!] Failed to Resolve URL'
				return
		else:
			self.target += self.ip
		try:
			print '[*] Querying for Records of %s...\n' %(self.target)
			query_obj = pygeoip.GeoIP(self.datfile)
			for key, val in query_obj.record_by_addr(self.target).items():
				print '%s: %s' %(key, val)
			print '\n[*] Query Complete!'
		except Exception:
			print '\n[!] Failed to Retrieve Records'
			return

if __name__ == '__main__':
	import argparse
	parser = argparse.ArgumentParser(description='IP Geolocation Tool')
	parser.add_argument('--url', help='Locate an IP based on a URL', action='store', default=False, dest='url')
        parser.add_argument('-t', '--target', help='Locate the specified IP', action='store', default=False, dest='ip')
        parser.add_argument('--dat', help='Custom database filepath', action='store', default=False, dest='datfile')
	args = parser.parse_args()
	if ((not not args.url) and (not not args.ip)) or ((not args.url) and (not args.ip)):
		parser.error('invalid target specification')
	try:
		locate = Locator(url=args.url, ip=args.ip, datfile=args.datfile)
		locate.check_database()
		locate.query()
	except Exception:
		print '\n[!] An Unknown Error Occured'

Tips

編碼 UTF-8 宣告
#!/usr/bin/python
# -*- coding: utf-8 -*-
Find all installed modules
help("modules");

目前環境的模組安裝路徑

import powerline
powerline.__path__

# Return ['/home/alang/.local/lib/python3.10/site-packages/powerline']

Virtual Environment

Conda

# Create a virtual env
conda create -n myproj python=3.11

# Activate the virtual env
conda activate myproj

# Deactivate the virtual env
conda deactivate

Python 3.4+ built-in venv

# Install venv
sudo apt install python3-venv

# Enable venv
mkdir myproject
cd myproject
python -m venv .venv

# Activate the venv
source .venv/bin/activate

# Delete the venv
deactivate
rm -rf .venv

# Change the App directory after activating venv
cd /path/to
mv old new
cd new/.venv/bin
old_path="/path/to/old/.venv"
new_path="/path/to/new/.venv"
find ./ -type f -exec sed -i "s|$old_path|$new_path|g" {} \;
cd /path/to/new
source .venv/bin/activate

With virtualenv and virtualenvwrapper

# Installing virtualenv and virtualenvwrapper
sudo pip install virtualenv virtualenvwrapper

# Update the profile ~/.bashrc
# Add the  following lines

# Python virtualenv and virtualenvwrapper
export WORKON_HOME=$HOME/.virtualenvs
export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3
source /usr/local/bin/virtualenvwrapper.sh

# Reload the profile
source ~/.bashrc

# Creating python virtual environment
# The py3cv3 is a self-defined name 
mkvirtualenv py3cv3 -p python3

# Enter the specified virtual environment
workon py3cv3

# Exit the the specified virtual environment
deactivate

# List all of the environments.
lsvirtualenv

# Remove an environment
rmvirtualenv py3cv3
Print
for left in range(7):
  for right in range(left, 7):
    print("[" + str(left) + "|" + str(right) + "]", end=" ")
  print()

Print the List with join() 

greetings = ["Hello", "world"]
print(" ".join(greetings))  # Prints "Hello world"
Timestamp
timestamp = datetime.datetime.now()
print("It is {}".format(timestamp.strftime("%A %d %B %Y %I:%M:%S%p")))
Math
total += 1
If-else
# Boolean, none
if motion is not None:
if not flag:

# Number
if delay > 0:
if delay == 0:
if total > frameCount:

# String
if "blue" in style:
if authors.startswith('['):
    authors = authors.lstrip('[').rstrip(']')

# One-liner
def doi_url(d): return f'http://{d}' if d.startswith('doi.org') else f'http://doi.org/{d}'

# Multiple conditions
temperature = 25
if temperature > 30:
    print('Hot')
elif temperature > 20 and temperature <= 30:
    print('Warm')
else:
    print('Cool')
    
# Reverse the True
temperature = 15
if not temperature > 20:
    print('Cool')
#    
temperature = 25
humidity = 55
rain = 0
if temperature > 30 or humidity < 70 and not rain > 0:
    print('Dry conditions')

# Logical operators, AND, OR, NOT
if status >= 200 and status <= 226:
if status == 100 or status == 102:
if not(status >= 200 and status <= 226):

operator

operator

use

>

greater than

<

less than

>=

greater than or equal to

<=

less than or equal to

==

equal to

!=

not equal to

sys.argv
import sys

logfile = sys.argv[1]
with open(logfile) as f:
  for line in f:
    if "CRON" not in line:
      continue
    print(line.strip())
argparse
import argparse
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--interval", required=False,
        help="Seconds to Interval (Default:30)", default="30", type=int)
ap.add_argument("-o", "--output", required=False,
        help="Path to Output Logs (Default:std-out)")
ap.add_argument("mac", 
        help="MAC address of LYWSD02 device", nargs="+")
args = vars(ap.parse_args())

# Usage
intv = args["interval"]
logfile = args["output"]
from argparse import ArgumentParser

def _get_args():
    parser = ArgumentParser()
    parser.add_argument("-c", "--checkpoint-path", type=str, default=DEFAULT_CKPT_PATH,
                        help="Checkpoint name or path, default to %(default)r")
    parser.add_argument("--cpu-only", action="store_true", help="Run demo with CPU only")

    parser.add_argument("--share", action="store_true", default=False,
                        help="Create a publicly shareable link for the interface.")
    parser.add_argument("--inbrowser", action="store_true", default=False,
                        help="Automatically launch the interface in a new tab on the default browser.")
    parser.add_argument("--server-port", type=int, default=8000,
                        help="Demo server port.")
    parser.add_argument("--server-name", type=str, default="127.0.0.1",
                        help="Demo server name.")

    args = parser.parse_args()
    return args

def _test_args(args);
    if args.cpu_only:
        device_map = "cpu"
    else:
        device_map = "auto"

    ckp_path = args.checkpoint_path

    return device_map, ckp_path
  
def main():
    args = _get_args()
    device_map, ckp_path = _test_args(args)

if __name__ == '__main__':
    main()
#
# Nagios2 HTTP proxy test
#
# usage: check_http_proxy --proxy=proxy:port --auth=user:pass --url=url --timeout=10 --warntime=5 --expect=content

import sys
import getopt

def get_cmdline_cfg():
	try:
		opts, args = getopt.getopt(
			sys.argv[1:],
			"p:a:t:w:e:u:",
			["proxy=", "auth=", "timeout=", "warntime=", "expect=", "url="]
		)
	except getopt.GetoptError, err:
		print("SCRIPT CALLING ERROR: {0}".format(str(err)))

	### Build cfg dictionary
	cfg = {}
	for o, a in opts:
		if o in ("-p", "--proxy"):
			cfg["proxy"] = a
		elif o in ("-a","--auth"):
			cfg["auth"] = a
		elif o in ("-t","--timeout"):
			cfg["timeout"] = float(a)
		elif o in ("-w","--warntime"):
			cfg["warntime"] = float(a)
		elif o in ("-e","--expect"):
			cfg["expect"] = a
		elif o in ("-u","--url"):
			cfg["url"] = a

	# These are required
	for req_param in ("url", "proxy"):
		if req_param not in cfg:
			print("Missing parameter: {0}".format(req_param))

	return cfg
  
# Usage
if __name__ == '__main__':
	cfg = get_cmdline_cfg()
    
	if "auth" in cfg:
		proxy_url = "http://{auth}@{proxy}/".format(**cfg)
	else:
		proxy_url = "http://{proxy}/".format(**cfg)
    
Reading and Writing files

Open mode

Read file: 一次讀取一行,內容輸出為 String 格式

Tip: 用 with 開檔時,不需要另外做關閉檔案動作。 

with open("spider.txt") as file:
    for line in file:
        print(line.strip().upper())

Read file: 一次讀取整個檔案,內容輸出為 List 格式 

file = open("spider.txt")
lines = file.readlines()
file.close()
lines.sort()
print(lines)

Write a file: 內容輸入為 String 格式,如果寫檔成功,回傳 string 的字元長度

with open("novel.txt", "w") as file:
    file.write("It was a dark and stormy night")

# Return 30
# when successful, return the length of the string
guests = open("guests.txt", "w")
initial_guests = ["Bob", "Andrea", "Manuel", "Polly", "Khalid"]

for i in initial_guests:
    guests.write(i + "\n")
    
guests.close()

Read and Write file

# Read a txt file
with open("update_log.txt", "r") as file:
    updates = file.read()

print(updates)

# Write a txt file
# With both "w" and "a", you can use the .write() method
# "a" if you want to append to a file
line = "jrafael,192.168.243.140,4:56:27,True"
with open("access_log.txt", "w") as file:
    file.write(line)

# Write a CSV or multi-lines file
login_file = """username,ip_address,time,date
tshah,192.168.92.147,15:26:08,2022-05-10
dtanaka,192.168.98.221,9:45:18,2022-05-09
tmitchel,192.168.110.131,14:13:41,2022-05-11
daquino,192.168.168.144,7:02:35,2022-05-08
eraab,192.168.170.243,1:45:14,2022-05-11
jlansky,192.168.238.42,1:07:11,2022-05-11
acook,192.168.52.90,9:56:48,2022-05-10
"""

with open("login.txt", "w") as file:
    file.write(login_file)

Encoding: 如果沒有指定,就以作業系統設定為主

f = open('workfile', 'w', encoding="utf-8")

with open('log_file', mode='r',encoding='UTF-8') as file:
    for log in file.readlines():
File and Directory

Managing files

import os
os.remove("novel.txt")

os.rename("first_draft.txt", "finished_masterpiece.txt")

os.path.exists("finished_masterpiece.txt")
# Return True or False

os.path.getsize("spider.txt")
#This code will provide the file size

import datetime
timestamp = os.path.getmtime("spider.txt")
datetime.datetime.fromtimestamp(timestamp)
#This code will provide the date and time for the file in an 
#easy-to-understand format

os.path.abspath("spider.txt")
#This code takes the file name and turns it into an absolute path

Managing directories

os.mkdir("new_dir")
#The os.mkdir("new_dir") function creates a new directory called new_dir

os.chdir("new_dir")
os.getcwd()
#This code snippet changes the current working directory to new_dir. 
#The second line prints the current working directory.

os.mkdir("newer_dir")
os.rmdir("newer_dir")
#This code snippet creates a new directory called newer_dir. 
#The second line deletes the newer_dir directory.

import os
os.listdir("website")
#This code snippet returns a list of all the files and 
#sub-directories in the website directory.

dir = "website"
for name in os.listdir(dir):
    fullname = os.path.join(dir, name)
    if os.path.isdir(fullname):
        print("{} is a directory".format(fullname))
    else:
        print("{} is a file".format(fullname))

Using os module

# Create a directory and move a file from one directory to another
# using low-level OS functions.

import os

# Check to see if a directory named "test1" exists under the current
# directory. If not, create it:
dest_dir = os.path.join(os.getcwd(), "test1")
if not os.path.exists(dest_dir):
 os.mkdir(dest_dir)


# Construct source and destination paths:
src_file = os.path.join(os.getcwd(), "sample_data", "README.md")
dest_file = os.path.join(os.getcwd(), "test1", "README.md")


# Move the file from its original location to the destination:
os.rename(src_file, dest_file)

Using pathlib module

# Create a directory and move a file from one directory to another
# using Pathlib.

from pathlib import Path

# Check to see if the "test1" subdirectory exists. If not, create it:
dest_dir = Path("./test1/")
if not dest_dir.exists():
  dest_dir.mkdir()

# Construct source and destination paths:
src_file = Path("./sample_data/README.md")
dest_file = dest_dir / "README.md"

# Move the file from its original location to the destination:
src_file.rename(dest_file)
os.environ
import os
import subprocess

my_env = os.environ.copy()
my_env["PATH"] = os.pathsep.join(["/opt/myapp/", my_env["PATH"]])

result = subprocess.run(["myapp"], env=my_env)
import os
print("HOME: " + os.environ.get("HOME", ""))
print("SHELL: " + os.environ.get("SHELL", ""))
print("FRUIT: " + os.environ.get("FRUIT", ""))

input
def to_seconds(hours, minutes, seconds):
    return hours*3600+minutes*60+seconds

print("Welcome to this time converter")

cont = "y"
while(cont.lower() == "y"):
    hours = int(input("Enter the number of hours: "))
    minutes = int(input("Enter the number of minutes: "))
    seconds = int(input("Enter the number of seconds: "))

    print("That's {} seconds".format(to_seconds(hours, minutes, seconds)))
    print()
    cont = input("Do you want to do another conversion? [y to continue] ")
    
print("Goodbye!")
subprocess

Run system commands in Python

import subprocess
subprocess.run(["date"])
subprocess.run(["sleep", "2"])
result = subprocess.run(["ls", "this_file_does_not_exist"])
print(result.returncode)
print(result.stderr)
result = subprocess.run(["host", "8.8.8.8"], capture_output=True)
print(result.stdout)

# Output: b'8.8.8.8.in-addr.arpa domain name pointer dns.google.\n'

result = subprocess.run(["host", "8.8.8.8"], capture_output=True)
print(result.stdout.decode().split())
import os
import subprocess

my_env = os.environ.copy()
my_env["PATH"] = os.pathsep.join(["/opt/myapp/", my_env["PATH"]])

result = subprocess.run(["myapp"], env=my_env)
result_run = subprocess.run(['echo', 'Hello, World!'], capture_output=True, text=True)
result_run.stdout.strip()  # Extracting the stdout and stripping any extra whitespace

# Output: 'Hello, World!'
return_code_check_call = subprocess.check_call(['echo', 'Hello from check_call!'])
print(return_code_check_call)

# Output 0
output_check_output = subprocess.check_output(['echo', 'Hello from check_output!'], text=True)
output_check_output.strip()  # Extracting the stdout and stripping any extra whitespace

# Output 'Hello from check_output!'
process_popen = subprocess.Popen(['echo', 'Hello from popen!'], stdout=subprocess.PIPE, text=True)
output_popen, _ = process_popen.communicate()
output_popen.strip()  # Extracting the stdout and stripping any extra whitespace

# Output: 'Hello from popen!'
process = subprocess.Popen(['sleep', '5'])
message_1 = "The process is running in the background..."

# Give it a couple of seconds to demonstrate the asynchronous behavior
import time
time.sleep(2)

# Check if the process has finished
if process.poll() is None:
	message_2 = "The process is still running."
else:
	message_2 = "The process has finished."

print(message_1, message_2)
# subprocess
subprocess.run(['mkdir', 'test_dir_subprocess2'])

# OS
os.mkdir('test_dir_os2')

# Pathlib
test_dir_pathlib2 = Path('test_dir_pathlib2')
test_dir_pathlib2.mkdir(exist_ok=True) #Ensures the directory is created only if it doesn't already exist
logging

Level: DEBUG, INFO, WARNING, ERROR, CRITICAL

import logging

logging.warning('This is a warning message')
logging.error('This is an error message')

logging.basicConfig(level=logging.DEBUG)
logging.debug('This is a debug message')

logging.basicConfig(filename='app.log', level=logging.DEBUG)
logging.info('This message will be written to app.log')

logging.basicConfig(format='%(asctime)s - %(levelname)s - %(message)s', level=logging.DEBUG)
logging.error('This is an error with a custom format')

Functions

參數類型定義範例
def _gpt_parse_images(
        image_infos: List[Tuple[str, List[str]]],
        prompt_dict: Optional[Dict] = None,
        output_dir: str = './',
        api_key: Optional[str] = None,
        base_url: Optional[str] = None,
        model: str = 'gpt-4o',
        verbose: bool = False,
        gpt_worker: int = 1,
        **args
) -> str:
    """
    Parse images to markdown content.
    """
Print and Log
def print_f(*msg):
    '''print and log!'''
    # import datetime for timestamps
    import datetime as dt
    # convert input arguments to strings for concatenation
    message = []
    for m in msg:
        message.append(str(m))
    message = ' '.join(message)
    # append to the log file
    with open('/tmp/test.log','a') as log:
        log.write(f'{dt.datetime.now()} | {message}\n')
    # print the message using the copy of the original print function to stdout
    print(message)
    
print_f('Test Message')
Sendmail via SMTP
def send_message(body, subject, to_addr):
    import smtplib
    from email.message import EmailMessage
    smtp_user = "your-smtp-user"
    smtp_pass = "your-smtp-pass"
    smtp_server = "smtp-relay.your.server"
    smtp_port = "587"

    msg = EmailMessage()
    msg['Subject'] = subject
    msg['From'] = smtp_user
    msg['To'] = to_addr
    msg.set_content(body)

    with smtplib.SMTP(smtp_server, smtp_port) as smtp:
        smtp.login(smtp_user, smtp_pass)
        smtp.send_message(msg)

debug = send_message("This is plain TEXT email", "Test from SMTP", "alang.hsu@gmail.com")
print(debug)


THSRC API

API 連線認證
  1. Client Id: 透過官網取得
  2. Client Secret: 透過官網取得
  3. Access Token: 使用 HTTP POST 帶入Client Id 和 Client Secret 進行驗證以取得 Access Token。

Get Access Token

curl --request POST \
     --url 'https://tdx.transportdata.tw/auth/realms/TDXConnect/protocol/openid-connect/token' \
     --header 'content-type: application/x-www-form-urlencoded' \
     --data grant_type=client_credentials \
     --data client_id=YOUR_CLIENT_ID \
     --data client_secret=YOUR_CLIENT_SECRET \

回傳內容格式:

Case: 指定日期、時間區間與起訖站,列出對號座即時剩餘座位資訊

API:

  1. /v2/Rail/THSR/DailyTimetable/Station/{StationID}/{TrainDate}
    • 取得指定日期,車站的站別時刻表
    • 依時間區間過濾,篩選出車次號碼
  2. /v2/Rail/THSR/AvailableSeatStatus/Train/OD/{OriginStationID}/to/{DestinationStationID}/TrainDate/{TrainDate}
    • 取得指定[日期], [起迄站]對號座即時剩餘位資料
    • 依車次號碼查詢剩餘座位

NOTE: 剩餘座位資料更新間隔,如果是今天,頻率為每十分鐘;如果不是今天,頻率為每日的 10, 16, 22 時。

 

JSON

JSON to dict

json.loads 用來轉換資料; json.load 用來讀檔。

import json

person = '{"name": "Bob", "languages": ["English", "French"]}'
person_dict = json.loads(person)

# Output: {'name': 'Bob', 'languages': ['English', 'French']}
print( person_dict)

# Output: ['English', 'French']
print(person_dict['languages'])
Dict to JSON
import json

person_dict = {'name': 'Bob',
'age': 12,
'children': None
}
person_json = json.dumps(person_dict)

# Output: {"name": "Bob", "age": 12, "children": null}
print(person_json)
Read JSON file
import json

with open('path_to_file/person.json', 'r') as f:
  data = json.load(f)

# Output: {'name': 'Bob', 'languages': ['English', 'French']}
print(data)
Write JSON file

json.dump 用來寫檔案; json.dumps 用來轉換資料。

import json

person_dict = {"name": "Bob",
"languages": ["English", "French"],
"married": True,
"age": 32
}

with open('person.txt', 'w') as json_file:
  json.dump(person_dict, json_file)
Print JSON
import json

person_string = '{"name": "Bob", "languages": "English", "numbers": [2, 1.6, null]}'

# Getting dictionary
person_dict = json.loads(person_string)

# Pretty Printing JSON string back
print(json.dumps(person_dict, indent = 4, sort_keys=True))
Access JSON
import json

json_data = '''
{
    "students": [
        {
            "name": "David",
            "age": 19,
            "grades": {
                "math": 90,
                "english": 87
            }
        },
        {
            "name": "Harry",
            "age": 21,
            "grades": {
                "math": 85,
                "english": 95
            }
        }
    ]
}
'''

# Parse JSON Data
data = json.loads(json_data)

# To access a large dataset we can use `for loop`
for student in data["students"]:
    name = student["name"]
    math_mark = student["grades"]["math"]
    english_mark = student["grades"]["english"]
    average_mark = (math_mark + english_mark) / 2
    print(f"{name}, Avarage Marks: {average_mark:.2f}")


# Output:
# David, Average Marks: 88.50
# Harry, Average Marks: 90.00
import json

original_data_file="students_data.json"
updated_data_file="students_data_updated.json"

# reading `JSON file`
with open(original_data_file,"r") as file:
   students_result = json.load(file)

# Updating JSON Data
for student in students_result['students']:
    print(student['name'])
    
    if student['name'] == "Kabir":
        student['name'] = "John"
        
    grades = student['grades']
    avarage_mark= sum(grades.values()) / len(grades)
    student['avarage_mark'] = avarage_mark

# Saving updated data into a new file
with open(updated_data_file,"w") as file:
    json.dump(students_result,file,indent=4)
Get JSON from URL
import requests, json

# Response will be saved here
weather_data="weather_data.json"

# Request to `openweathermap` API
api_key = "6423af6e554f98cf1e6b8c6a7700986b"   #REPLACE_WITH_YOUR_API_KEY
location = "Dhaka"
url = f"https://api.openweathermap.org/data/2.5/weather?q={location}&appid={api_key}&units=metric"

# Response
response = requests.get(url)

# Get `Place` and `Temperature` from the Response
if response.status_code == 200:
    json_data = response.json()
    print(f"Place: {json_data['name']}, Temperature: {json_data['main']['temp']} celsius")
else:
    print(f"Request failed with status code {response.status_code}")

# Save the Response to a file
with open(weather_data,"w") as file:
       json.dump(json_data,file,indent=4)


# Output:
# Place: Dhaka, Temperature: 27.99 celsius
# Handling a JSONDecodeError in Python
from json import JSONDecodeError
import requests
resp = requests.get('https://reqres.in/api/users/page4')
try:
    resp_dict = resp.json()
except JSONDecodeError:
    print('Response could not be serialized')
Data Type

使用 json.loads 轉換資料型別時,要注意輸出的類型可能是 dict 或者 array,這要看原始JSON 的資料格式。

JSON
Python
object
dict
array
list
string
str
number (integer)
int
number (real)
float
true
True
false
False
null
N
Library

jsonpath-ng

{
  "employees": [
    {
      "id": 1,
      "name": "Pankaj",
      "salary": "10000"
    },
    {
      "name": "David",
      "salary": "5000",
      "id": 2
    }
  ]
}
import json
from jsonpath_ng import jsonpath, parse

with open("db.json", 'r') as json_file:
    json_data = json.load(json_file)

print(json_data)

jsonpath_expression = parse('employees[*].id')

for match in jsonpath_expression.find(json_data):
    print(f'Employee id: {match.value}')
{'employees': [{'id': 1, 'name': 'Pankaj', 'salary': '10000'}, {'name': 'David', 'salary': '5000', 'id': 2}]}
Employee id: 1
Employee id: 2

Datetime

時間格式代碼
Today, Now
import datetime

dt_now = datetime.datetime.now()
print(dt_now)
# 2018-02-02 18:31:13.271231

print(type(dt_now))
# <class 'datetime.datetime'>

print(dt_now.year)
# 2018

print(dt_now.hour)
# 18
String to Datetime
from datetime import datetime

date_str = '09-19-2022'

date_object = datetime.strptime(date_str, '%m-%d-%Y').date()
print(type(date_object))
print(date_object)  # printed in default format

# Output:
# <class 'datetime.date'>
# 2022-09-19
from datetime import datetime

time_str = '13::55::26'
time_object = datetime.strptime(time_str, '%H::%M::%S').time()
print(type(time_object))
print(time_object)

# Output:
# <class 'datetime.time'>
# 13:55:26
from datetime import datetime
import locale

locale.setlocale(locale.LC_ALL, 'de_DE')
date_str_de_DE = '16-Dezember-2022 Freitag'  # de_DE locale
datetime_object = datetime.strptime(date_str_de_DE, '%d-%B-%Y %A')
print(type(datetime_object))
print(datetime_object)

# Output:
# <class 'datetime.datetime'>
# 2022-12-16 00:00:00
date
import datetime
d = datetime.date(2020,1,1)   # 2020-01-01
import datetime
today = datetime.date.today()
print(today)                 # 2021-10-19
print(today.year)            # 2021
print(today.month)           # 10
print(today.day)             # 19
print(today.weekday())       # 1    ( 因為是星期二,所以是 1 )
print(today.isoweekday())    # 2    ( 因為是星期二,所以是 2 )
print(today.isocalendar())   # (2021, 42, 2)  ( 第三個數字是星期二,所以是 2 )
print(today.isoformat())     # 2021-10-19
print(today.ctime())         # Tue Oct 19 00:00:00 2021
print(today.strftime('%Y.%m.%d'))    # 2021.10.19

newDay = today.replace(year=2020)
print(newDay)                # 2020-10-19
import datetime
d1 = datetime.date(2020, 6, 24)
d2 = datetime.date(2021, 11, 24)
print(abs(d1-d2).days)       # 518
time
import datetime
thisTime = datetime.time(12,0,0,1)
print(thisTime)   # 12:00:00.000001
import datetime
thisTime = datetime.time(14,0,0,1,tzinfo=datetime.timezone(datetime.timedelta(hours=8)))
print(thisTime)               # 14:00:00.000001+08:00
print(thisTime.isoformat())   # 14:00:00.000001+08:00
print(thisTime.tzname())      # UTC+08:00
print( thisTime.strftime('%H:%M:%S'))   # 14:00:00

newTime = today.replace(hour=20)
print(newTime)                # 20:00:00.000001+08:00
datetime
import datetime
thisTime = datetime.datetime(2020,1,1,20,20,20,20)
print(thisTime)    # 2020-01-01 20:20:20.000020
import datetime
print(datetime.datetime.today())    # 2021-10-19 06:15:46.022925
print(datetime.datetime.now(tz=datetime.timezone(datetime.timedelta(hours=8))))
# 2021-10-19 14:15:46.027982+08:00
print(datetime.datetime.utcnow())   # 2021-10-19 06:15:46.028630
import datetime
now = datetime.datetime.now(tz=datetime.timezone(datetime.timedelta(hours=8)))
print(now)                # 2021-10-19 14:25:46.962975+08:00
print(now.date())         # 2021-10-19
print(now.time())         # 14:25:46.962975
print(now.tzname())       # UTC+08:00
print(now.weekday())      # 1
print(now.isoweekday())   # 2
print(now.isocalendar())  # (2021, 42, 2)
print(now.isoformat())    # 2021-10-19 14:25:46.962975+08:00
print(now.ctime())        # Tue Oct 19 14:48:38 2021
print(now.strftime('%Y/%m/%d %H:%M:%S'))  # 2021/10/19 14:48:38
print(now.timetuple())    # time.struct_time(tm_year=2021, tm_mon=10, tm_mday=19, tm_hour=16, tm_min=8, tm_sec=6, tm_wday=1, tm_yday=292, tm_isdst=-1)
timedelta

日期/時間計算

import datetime
today = datetime.datetime.now()
yesterday = today - datetime.timedelta(days=1)
tomorrow = today + datetime.timedelta(days=1)
nextweek = today + datetime.timedelta(weeks=1)
print(today)       # 2021-10-19 07:01:22.669886
print(yesterday)   # 2021-10-18 07:01:22.669886
print(tomorrow)    # 2021-10-20 07:01:22.669886
print(nextweek)    # 2021-10-26 07:01:22.669886
Timezone
import datetime
tzone = datetime.timezone(datetime.timedelta(hours=8))
now = datetime.datetime.now(tz=tzone)
print(now)    # 2021-10-19 15:07:51.128092+08:00
from datetime import datetime, timezone

# Get the current time in UTC
utc_time = datetime.now(timezone.utc)

print(utc_time)
from datetime import datetime
import pytz

timezone = pytz.timezone("America/New_York")

current_time_in_timezone = datetime.now(timezone)

print(current_time_in_timezone)
Sleep
import time

time.sleep(5) # Pauses the code for 5 seconds
Timestamp

Get Current Time in Milliseconds

milliseconds_since_epoch = time.time() * 1000

Get Current Timestamp

current_timestamp = time.time()

print(current_timestamp)

Timestamp to a human-readable date

timestamp = time.time()

readable_date = datetime.fromtimestamp(timestamp)

print(readable_date)
Time Diff.
time1 = datetime.now()

# ... some operations ...

time2 = datetime.now()

difference = time2 - time1

print(difference)
start_time = time.time()

# ... some operations ...

end_time = time.time()

elapsed_time = end_time - start_time

print(f"Time elapsed: {elapsed_time} seconds")
函式:日期轉換週
import datetime

def dow(date):
    dateobj = datetime.datetime.strptime(date, r"%Y-%m-%d")
    return dateobj.strftime("%A")

date_str = "2024-12-11"
print(dow(date_str))  # Output: Wednesday

函式:隔年日期

import datetime
from datetime import date

def add_year(date_obj):
  try:
    new_date_obj = date_obj.replace(year = date_obj.year + 1)
  except ValueError:
    # This gets executed when the above method fails, 
    # which means that we're making a Leap Year calculation
    new_date_obj = date_obj.replace(year = date_obj.year + 4)
  return new_date_obj

def next_date(date_string):
  # Convert the argument from string to date object
  date_obj = datetime.datetime.strptime(date_string, r"%Y-%m-%d")
  next_date_obj = add_year(date_obj)
  #print("DEBUG", next_date_obj)

  # Convert the datetime object to string, 
  # in the format of "yyyy-mm-dd"
  next_date_string = next_date_obj.strftime("%Y-%m-%d")
  return next_date_string

today = date.today()  # Get today's date
#print("DEBUG Today: ", today)
print(next_date(str(today))) 
# Should return a year from today, unless today is Leap Day

print(next_date("2021-01-01")) # Should return 2022-01-01
print(next_date("2020-02-29")) # Should return 2024-02-29

 

Resources

One-Liners

1) Multiple Variable Assignment

# Traditional way
a = 1
b = "ok"
c = False
 
# Pythonic way
a, b, c = 1, "ok", False
 
# Result
print(a, b, c)
# Show: 1 ok False

2) Variable Swap

# Traditional way
a = 1
b = "ok"
 
c = a
a = b
b = c
 
# Pythonic way
a, b = 1, "ok"
a, b = b, a
 
# Result
print(a, b)
# Shows: ok 1
# Pythonic way
a, b, c, d = 1, "ok", True, ["i", "j"]
a, b, c, d = c, a, d, b
 
# Result
print(a, b, c, d)
# Shows: True 1 ["i", "j"] ok

3) Variable Conditional Assignment

x = 3
 
# Traditional way
if x % 2 == 1:
    result = f"{x} is odd"
else:
    result = f"{x} is even"
 
# Pythonic way
result = f"{x} " + ("is odd" if x % 2 == 1 else "is even")
 
# Result
print(result)
# Shows: 3 is odd

4) Presence of a Value in a List

pet_list = ["cat", "dog", "parrot"]
 
# Traditional way
found = False
for item in my_list:
    if item == "cat":
        found = True
        break
 
# Pythonic way
found = "cat" in pet_list
 
# Result
print(found)
# Shows: True
pet_dict = {"cat": "Mitchi", "dog": "Max", "parrot": "Pepe"}
found = "cat" in pet_dict
print(found)
# Shows: True

5) Operations on Lists

my_list = [1, 2, 3, 4, 5]
 
# Traditional way
max_value = 0
for value in my_list:
    if value > max_value:
        max_value = value
 
# Pythonic way
max_value = max(my_list)
 
# Result
print(max_value)
# Shows: 5

6) List Creation with Duplicate Values

size = 10
 
# Traditional way
my_list = []
for i in range(size):
    my_list.append(0)
 
# Pythonic way
my_list = [0] * size
 
# Result
print(my_list)
# Shows: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
my_list = [1, 2] * 5

# Result: [1, 2, 1, 2, 1, 2, 1, 2, 1, 2]
my_tuple = (1, 2) * 5
print(my_tuple)
# Shows: (1, 2, 1, 2, 1, 2, 1, 2, 1, 2)

7) List Creation with Sequential Values

count = 10
 
# Traditional way
my_list = []
for i in range(count):
    my_list.append(i)
 
# Pythonic way
my_list = list(range(count))
 
# Result
print(my_list)
# Shows: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
# List with odd values
my_list = list(range(1, 10, 2))
print(my_list)
# Shows: [1, 3, 5, 7, 9]
# List with descending values and negative values
my_list = list(range(5, -5, -1))
print(my_list)
# Shows: [5, 4, 3, 2, 1, 0, -1, -2, -3, -4]
my_set = set(range(count))
my_tuple = tuple(range(count))
 
# Result
print(my_set)
# Shows: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
print(my_tuple)
# Shows: (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)

8) List Creation with a Loop

count = 4
 
# Traditional way
my_list = []
for i in range(count):
    my_list.append(count**i)
 
# Pythonic way
my_list = [count**x for x in range(count)]
 
# Result
print(my_list)
# Shows: [1, 4, 16, 64]

my_set = set(count**x for x in range(count))
print(my_set)
# Shows: {1, 4, 16, 64}
squares = [i * i for i in range(5)]
# [0, 1, 4, 9, 16]

squares = [i * i for i in range(5) if i % 2 == 0]
# [0, 4, 16]

9) List Creation with Conditions if-else

users = [("Megan", 56),
("Karen", 32),
("Chad", 28),
("Brent", 44)]

# Traditional way
young_users = []
for user in users:
    if (user[1] < 35):
        young_users.append(user[0])
 
# Pythonic way
young_users = [x for x, y in users if y < 35]
 
# Result
print(young_users)
# ["Karen", "Chad"]
var = 42 if 3 > 2 else 999
# 42

10) Reading a File Line by Line

# Traditional way
lines = []
with open(filename) as file:
    for count, line in enumerate(file):
        lines.append(f"Line {count + 1}: " + line.strip())
 
# Pythonic way
with open(filename) as file:
    lines = [f"Line {count + 1}: " + line.strip() for count, line in enumerate(file)]
my_list = [line.strip() for line in open('filename.txt', 'r')]

11) Print without new lines

# No need to do this:
data = [0, 1, 2, 3, 4, 5]
for i in data:
    print(i, end=" ")
print()

# One-liner
print(*data)
# 0 1 2 3 4 5

12) Days left in year

import datetime;print((datetime.date(2023,1,1)-datetime.date.today()).days)
# 36
>> python -c "import datetime;print((datetime.date(2023,1,1)-datetime.date.today()).days)"
36

>> alias daysleft='python -c "import datetime;print((datetime.date(2023,1,1)-datetime.date.today()).days)"'

>> daysleft
36

13) Reversing a List

a = [1, 2, 3, 4, 5, 6]
a = a[::-1]
# [6, 5, 4, 3, 2, 1]

14) 以空白做區隔的數字字串,轉換成整數 List

user_input = "1 2 3 4 5 6"

my_list = list(map(int, user_input.split()))
# [1, 2, 3, 4, 5, 6]

 

 

List 串列

串列是任何類型元素的序列,並且是可變的。用於儲存項目集合,它們可以包含任何型別的資料,並以方(中)括號表示。

a = [1, 2, 3, 4, 5]
b = ['mango', 'pineapple', 'orange']

在 Python 中,List 和 String 非常相似。它們都是資料序列的範例。序列有類似的屬性,例如:

  1. 可以使用 for 迴圈迭代序列
  2. 支援索引 indexing 
  3. 使用 len() 函數找出序列的長度
  4. 使用加號運算符 + 來串連
  5. 使用 in 關鍵字來檢查序列是否包含一個值

List 與 String 的差異是,String 內容是不可變的 (immutable);List 內容可以變動 (mutable)。

List methods

list.append()
numbers = [1, 2, 3, 4]
numbers.append(5)
print(numbers)
 
# output: [1, 2, 3, 4, 5]
list.insert()
animals = ["cat", "dog", "fish"]
animals.insert(1, "monkey")
print(animals)
 
# output: ["cat", "monkey", "dog", "fish"]

animals = ["cat", "dog", "fish"]
animals.insert(200, "monkey")
print(animals)
 
# output: ["cat", "dog", "fish", "monkey"]
list.extend()

合併兩個 Lists

things = ["John", 42, True]
other_things = [0.0, False]
things.append(other_things)
print(things)
 
# output: ["John", 42, True, [0.0, False]]

things = ["John", 42, True]
other_things = [0.0, False]
things.extend(other_things)
print(things)
 
# output: ["John", 42, True, 0.0, False]
# This function accepts two variables, each containing a list of years.
# A current "recent_first" list contains [2022, 2018, 2011, 2006].
# An older "recent_last" list contains [1989, 1992, 1997, 2001].
# The lists need to be combined with the years in chronological order.
def record_profit_years(recent_first, recent_last):

    # Reverse the order of the "recent_first" list so that it is in 
    # chronological order.
    recent_first.reverse()

    # Extend the "recent_last" list by appending the newly reversed 
    # "recent_first" list.
    recent_last.extend(recent_first)

    # Return the "recent_last", which now contains the two lists 
    # combined in chronological order. 
    return recent_last

# Assign the two lists to the two variables to be passed to the 
# record_profit_years() function.
recent_first = [2022, 2018, 2011, 2006]
recent_last = [1989, 1992, 1997, 2001]



# Call the record_profit_years() function and pass the two lists as 
# parameters. 
print(record_profit_years(recent_first, recent_last))
# Should print [1989, 1992, 1997, 2001, 2006, 2011, 2018, 2022]
list.remove()
Note: If there are two of the same element in a list, the .remove() method only removes the first instance of that element and not all occurrences.
booleans = [True, False, True, True, False]
 
booleans.remove(False)   # Removes the first False value
print(booleans)
 
# output: [True, True, True, False]
 
booleans.remove(False)   # Removes the other False value
print(booleans)
 
# output: [True, True, True]
 
booleans.remove(False)   # ValueError! No more False values to remove
list.pop()
fruits = ["apple", "orange", "banana", "peach"]
last_fruit = fruits.pop()  # takes the last element
print(last_fruit)
 
# output: "peach"
 
second_fruit = fruits.pop(1)  # takes the second element ( = index 1)
print(second_fruit)
 
# output: "orange"
 
print(fruits)  # only fruits that have not been "popped"
               # are still in the list
 
# output: ["apple", "banana"]
list.clear()
decimals = [0.1, 0.2, 0.3, 0.4, 0.5]
decimals.clear()  # remove all values!
print(decimals) 
 
# output: []
list.count()
grades = [7.8, 10.0, 7.9, 9.5, 10.0, 6.5, 9.8, 10.0]
n = grades.count(10.0)
print(n)
 
# output: 3
list.index()
Note: it only returns the index of the first occurrence of a list item.
friends = ["John", "James", "Jessica", "Jack"]
position = friends.index("Jessica")
print(position)
 
# output: 2
list.sort() and list.reverse()
values = [10, 4, -2, 1, 5]
 
values.reverse()
print(values)  # list is reversed
 
# output: [5, 1, -2, 4, 10]
 
values.sort()
print(values)  # list is sorted
 
# output: [-2, 1, 4, 5, 10]
values = [10, 4, -2, 1, 5]
 
values.sort(reverse=True)
print(values)  # list is sorted in reverse order
 
# output: [10, 5, 4, 1, -2]
list.copy()
values_01 = [1, 2, 3, 4]
values_02 = values_01  # not an actual copy: same list object!
 
values_02.append(5)  # we modify the "values_02" list...
print(values_01)     # ... but changes appear also in "values_01"
                     #     because they reference the same list!
 
# output: [1, 2, 3, 4, 5]


values_01 = [1, 2, 3, 4]
values_02 = values_01.copy()  # create an independent copy!
 
values_02.append(5)  # we modify the "values_02" list...
print(values_01)     # ... and changes DO NOT appear in "values_01"
                     #     because it is a copy!
 
# output: [1, 2, 3, 4]

List functions

sorted()/min()/max()
time_list = [12, 2, 32, 19, 57, 22, 14]
print(sorted(time_list))
print(time_list)

names = ["Carlos", "Ray", "Alex", "Kelly"]
print(sorted(names))  # Output ['Alex', 'Carlos', 'Kelly', 'Ray']
print(names)          # Output ['Carlos', 'Ray', 'Alex', 'Kelly']
print(sorted(names, key=len)) # Output ['Ray', 'Alex', 'Kelly', 'Carlos']

time_list = [12, 2, 32, 19, 57, 22, 14]
print(min(time_list))
print(max(time_list))
map()

Use map() and convert the map object to a list so we can print all the results at once.

# A simple function to add 1 to a given number
def add_one(number):
    return number + 1

# A list of numbers
numbers = [1, 2, 3, 4, 5]

# Use map to apply the function to each element in the list
result = map(add_one, numbers)

# Convert the map object to a list to print the result
print(list(result))

# Outputs: [2, 3, 4, 5, 6]
zip()

Use zip() to combine a list of names and ages into a list of tuples, and print all the tuples at once.

# 基本 zip() 教學範例
>>> x = ['a', 'b', 'c']
>>> y = [1,   2,   3]
>>> zipped = zip(x, y)
>>> type(zipped) # 回傳的是一個 'zip' 物件,它是可迭代的
<class 'zip'>
>>> zipped
<zip object at 0x108e8bc80>
 
## 用 loop 遍歷 zip 物件內容
>>> for i in zip(x, y):
...     print(i)
('a', 1)
('b', 2)
('c', 3)
 
# 也可用 list() 或 set() 將迭代器轉換成其他資料型態
>>> list(zip(x, y)) 
[('a', 1), ('b', 2), ('c', 3)]
>>> set(zip(x, y))
{('c', 3), ('b', 2), ('a', 1)}
# Two lists
names = ["Alice", "Bob", "Charlie"]
ages = [25, 30, 35]

# Use zip to combine the lists
combined = zip(names, ages)

# Convert the zip object to a list to print the result
print(list(combined))

# Outputs: [('Alice', 25), ('Bob', 30), ('Charlie', 35)]

Extracting from a list

# A element from a list
username_list = ["elarson", "fgarcia", "tshah", "sgilmore"]
print(username_list[2])

# one-liner
print(["elarson", "fgarcia", "tshah", "sgilmore"][2])

# A slice from a list
username_list = ["elarson", "fgarcia", "tshah", "sgilmore"]
print(username_list[0:2])

List with Loop

animals = ["Lion", "Zebra", "Dolphin", "Monkey"]
chars = 0
for animal in animals:
  chars += len(animal)

print("Total characters: {}, Average length: {}".format(chars, chars/len(animals)))

# Output: Total characters: 22, Average length: 5.5

enumerate() 函式會為串列中的每個元素回傳一個 tuple(元組)。元組中的第一個值是該元素在序列中的索引。元組中的第二個值是序列中的元素

winners = ["Ashley", "Dylan", "Reese"]
for index, person in enumerate(winners):
  print("{} - {}".format(index + 1, person))

# Output: 
#1 - Ashley
#2 - Dylan
#3 - Reese

Output by line + 2 "\n"

IDs = ["001","002","003","004"]
print("\n\n".join([id for id in IDs]))

For + If

mylist = [1, 4, 7, 8, 20]

newlist = [x for x in mylist if x % 2 == 0]
print(newlist)

Range()

mylist = ["a", "b", "c", "d", "e", "f", "g"]

for x in range(2, len(mylist) - 1):
    print(mylist[x])

List comprehensions

串列綜合運算。一個 list comprehension 的組成,是在一對方括號內,放入一個 expression(運算式)、一個 for 子句、再接著零個或多個 for 或 if 子句。結果會是一個新的 list,內容是在後面的 for 和 if 子句情境下,對前面運算式求值的結果

for loop vs. list comprehensions

# For Loop
multiples = []
for x in range(1,11):
  multiples.append(x*7)

print(multiples)

# List comprehensions
multiples = [x*7 for x in range(1,11)]
print(multiples)
# Output [7, 14, 21, 28, 35, 42, 49, 56, 63, 70]

Examples: Basic

languages = ["Python", "Perl", "Ruby", "Go", "Java", "C"]
lengths = [len(language) for language in languages]
print(lengths)

# Output [6, 4, 4, 2, 4, 1]
z = [x for x in range(0,101) if x % 3 == 0]
print(z)

# Output [0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48, 51, 54, 57, 60, 63, 66, 69, 72, 75, 78, 81, 84, 87, 90, 93, 96, 99]

NOTE: 條件式的位置不同,List 結果也會不同

years = ["January 2023", "May 2025", "April 2023", "August 2024", "September 2025", "December 2023"]

updated_years = [year.replace("2023","2024") if year[-4:] == "2023" else year for year in years]

print(updated_years) 
# Should print ["January 2024", "May 2025", "April 2024", "August 2024", "September 2025", "December 2024"]
years = ["January 2023", "May 2025", "April 2023", "August 2024", "September 2025", "December 2023"]

updated_years = [year.replace("2023","2024") for year in years if year[-4:] == "2023"]

print(updated_years) 
# Should print ['January 2024', 'April 2024', 'December 2024']

Examples: 建立多組 Tuple 的 List

# Create a list of tuples where each tuple contains the numbers 1, 2, and 3.
numbers = [(1, 2, 3) for _ in range(5)]

# numbers: [(1, 2, 3), (1, 2, 3), (1, 2, 3), (1, 2, 3), (1, 2, 3)]

Examples: 函式回傳 List

def squares(start, end):
    return [ n * n for n in range(start, end+1) ]

print(squares(2, 3))    # Should print [4, 9]
print(squares(1, 5))    # Should print [1, 4, 9, 16, 25]
print(squares(0, 10))   # Should print [0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

Examples: 函式奇數列表

def odd_numbers(x, y):
    return [n for n in range(x, y) if n % 2 != 0]

# Call the odd_numbers() function with two parameters.
print(odd_numbers(5, 15)) 
# Should print [5, 7, 9, 11, 13]


String 字串

字串是字元序列且是不可變的。以單引號或雙引號括起來的多個字元的集合,可以包含字母、數字和特殊字元。

Concatenate

secret_password = 'jhk7GSH8ds'
print('Password hint: the third letter of your password is ' + secret_password[2])
# Escaping characters
introduction = 'Hello, I\'m John!'
print(introduction)

# Joining strings
user_age = 28
user_name = 'John'
greeting = user_name + ', you are ' + str(user_age) + '!'
print(greeting)
s = 'String'
s += ' Concatenation'
print(s)
# Using %  NOTE: 舊版本適用
s1, s2, s3 = 'Python', 'String', 'Concatenation'
s = '%s %s %s' % (s1, s2, s3)
print(s)

# Using format()
s1, s2, s3 = 'Python', 'String', 'Concatenation'
s = '{} {} {}'.format(s1, s2, s3)
print(s)

# Using f-string
s1, s2, s3 = 'Python', 'String', 'Concatenation'
s = f'{s1} {s2} {s3}'
print(s)

Parsing

split()
"This is another example".split()
# Return ['This', 'is', 'another', 'example']
test = "How-much-wood-would-a-woodchuck-chuck"
print(test.split("-"))    # prints ['How', 'much', 'wood', 'would', 'a', 'woodchuck', 'chuck']
removed_users = "wjaffrey jsoto abernard jhill awilliam"
print("before .split():", removed_users)
removed_users = removed_users.split()
print("after .split():", removed_users)
with open("update_log.txt", "r") as file:
    updates = file.read()
updates = updates.split()
msg = "2024/12/11|Hello World|aaa@bb.com"
date, title, emails = msg.split("|")
print(date)
join()

.join() : convert a list into a string

approved_users = ["elarson", "bmoreno", "tshah", "sgilmore", "eraab"]
print("before .join():", approved_users)
approved_users = ",".join(approved_users)
print("after .join():", approved_users)

with open("update_log.txt", "r") as file:
    updates = file.read()
updates = updates.split()
updates = " ".join(updates)
with open("update_log.txt", "w") as file:
    file.write(updates)
# 以空白串接 List 的所有內容,輸出為字串
strings = ' '.join(my_list)

# 以空白行串接 List 的所有內容,輸出為字串
strings = '\n\n'.join(my_list)
def list_elements(list_name, elements):
    return "The " + list_name + " list includes: " + ", ".join(elements)

print(list_elements("Printers", ["Color Printer", "Black and White Printer", "3-D Printer"]))
# Should print "The Printers list includes: Color Printer, Black and White Printer, 3-D Printer"
index()

.index() : get the index of specified character

string = "Hello, World"
print(string.index('w'))
def replace_domain(email, old_domain, new_domain):
  if "@" + old_domain in email:
    index = email.index("@" + old_domain)
    new_email = email[:index] + "@" + new_domain
    return new_email
  return email
replace()

.replace(old,new) : Returns a new string where all occurrences of old have been replaced by new

test = "How much wood would a woodchuck chuck"
print(test.replace("wood", "plastic"))  # prints "How much plastic would a plasticchuck chuck"

Slicing

string1 = "Greetings, Earthlings"
print(string1[0])   # Prints “G”
print(string1[4:8]) # Prints “ting”
print(string1[11:]) # Prints “Earthlings”
print(string1[:5])  # Prints “Greet”

print(string1[-10:])     # Prints “Earthlings” again
phonenum = "2025551212"

# The first 3 digits are the area code:
area_code = "(" + phonenum[:3] + ")"
# area_code is (202)

# the numbers 4–6 from the list:
exchange = phonenum[3:6]
# exchange is 555

# the last four numbers:
line = phonenum[-4:]
# line is 1212

Formating

name = "Manny"
number = len(name) * 3
print("Hello {}, your lucky number is {}".format(name, number))
name = "Manny"
print("Your lucky number is {number}, {name}.".format(name=name, number=len(name)*3))
price = 7.5
with_tax = price * 1.09
print(price, with_tax)
print("Base price: ${:.2f}. With Tax: ${:.2f}".format(price, with_tax))
def to_celsius(x):
  return (x-32)*5/9

for x in range(0,101,10):
  print("{:>3} F | {:>6.2f} C".format(x, to_celsius(x)))
  0 F | -17.78 C
 10 F | -12.22 C
 20 F |  -6.67 C
 30 F |  -1.11 C
 40 F |   4.44 C
 50 F |  10.00 C
 60 F |  15.56 C
 70 F |  21.11 C
 80 F |  26.67 C
 90 F |  32.22 C
100 F |  37.78 C
f-strings
name = "Micah"
print(f'Hello {name}')
item = "Purple Cup"
amount = 5
price = amount * 3.25
print(f'Item: {item} - Amount: {amount} - Price: {price:.2f}')

More methods

strip()

.strip() , .lstrip() , .rstrip() 

" yes ".strip()    # Return 'yes'
" yes ".lstrip()   # Return 'yes '
" yes ".rstrip()   # Return ' yes'

# Multiple methods
' yes '.upper().strip() # Reyurn 'YES'
count()

.count() 

"The number of times e occurs in this string is 4".count("e")
# Return 4
endswith()

.endswith() 

"Forest".endswith("rest")
# Return True
isnumeric(), isalpha()

.isnumeric() , .isalpha() 

"Forest".isnumeric()         # Return False
"12345".isnumeric()          # Return True
"xyzzy".isalpha()            # Return True


Installation

Alternatives

變更 python 指令的預設路徑

alternatives --set python /usr/bin/python3
# Or
alternatives --config python

# Check the list
alternatives --list
Poetry

Poetry 應該要安裝在 Python 虛擬環境,與主要系統間做隔離。

curl -sSL https://install.python-poetry.org | python3 -

Unit Test

單元測試

Pytest
unittest

Methods

Example 1: rearrange.py

#!/usr/bin/env python3

import re

def rearrange_name(name):
  result = re.search(r"^([\w .]*), ([\w .]*)$", name)
  if result is None:
    return name
  return "{} {}".format(result[2], result[1])

rearrange_test.py : 

#!/usr/bin/env python3

import unittest

from rearrange import rearrange_name

class TestRearrange(unittest.TestCase):
    
  def test_basic(self):  # Basic test case
    testcase = "Lovelace, Ada"
    expected = "Ada Lovelace"
    self.assertEqual(rearrange_name(testcase), expected)

  def test_empty(self):  # Edge case, such as zero, blank, negative numbers, or extremely large numbers
    testcase = ""
    expected = ""
    self.assertEqual(rearrange_name(testcase), expected)

  def test_double_name(self):   # Additional test case
    testcase = "Hopper, Grace M."
    expected = "Grace M. Hopper"
    self.assertEqual(rearrange_name(testcase), expected)

  def test_one_name(self):      # Additional test case
    testcase = "Voltaire"
    expected = "Voltaire"
    self.assertEqual(rearrange_name(testcase), expected)

# Run the tests
unittest.main()

Tip: 在 Jupyter 環境執行 unittest.main() 時可能會出現錯誤,修正方法是改成unittest.main(argv = ['first-arg-is-ignored'], exit = False))

The output of the result:

.
----------------------------------------------------------------------
Ran 4 test in 0.000s

OK

Example 2: cakefactory.py

#!/usr/bin/env python3

from typing import List

class CakeFactory:
 def __init__(self, cake_type: str, size: str):
   self.cake_type = cake_type
   self.size = size
   self.toppings = []

   # Price based on cake type and size
   self.price = 10 if self.cake_type == "chocolate" else 8
   self.price += 2 if self.size == "medium" else 4 if self.size == "large" else 0

 def add_topping(self, topping: str):
     self.toppings.append(topping)
     # Adding 1 to the price for each topping
     self.price += 1

 def check_ingredients(self) -> List[str]:
     ingredients = ['flour', 'sugar', 'eggs']
     ingredients.append('cocoa') if self.cake_type == "chocolate" else ingredients.append('vanilla extract')
     ingredients += self.toppings
     return ingredients

 def check_price(self) -> float:
     return self.price

# Example of creating a cake and adding toppings
cake = CakeFactory("chocolate", "medium")
cake.add_topping("sprinkles")
cake.add_topping("cherries")
cake_ingredients = cake.check_ingredients()
cake_price = cake.check_price()


cake_ingredients, cake_price

cakefactory_test.py

#!/usr/bin/env python3

import unittest
from cakefactory import CakeFactory

class TestCakeFactory(unittest.TestCase):
 def test_create_cake(self):
   cake = CakeFactory("vanilla", "small")
   self.assertEqual(cake.cake_type, "vanilla")
   self.assertEqual(cake.size, "small")
   self.assertEqual(cake.price, 8) # Vanilla cake, small size

 def test_add_topping(self):
     cake = CakeFactory("chocolate", "large")
     cake.add_topping("sprinkles")
     self.assertIn("sprinkles", cake.toppings)

 def test_check_ingredients(self):
     cake = CakeFactory("chocolate", "medium")
     cake.add_topping("cherries")
     ingredients = cake.check_ingredients()
     self.assertIn("cocoa", ingredients)
     self.assertIn("cherries", ingredients)
     self.assertNotIn("vanilla extract", ingredients)

 def test_check_price(self):
     cake = CakeFactory("vanilla", "large")
     cake.add_topping("sprinkles")
     cake.add_topping("cherries")
     price = cake.check_price()
     self.assertEqual(price, 13) # Vanilla cake, large size + 2 toppings


# Running the unittests
unittest.TextTestRunner().run(unittest.TestLoader().loadTestsFromTestCase(TestCakeFactory))

This results in the output:

..F.
======================================================================
FAIL: test_check_price (__main__.TestCakeFactory)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython-input-9-32dbf74b3655>", line 33, in test_check_price
    self.assertEqual(price, 13) # Vanilla cake, large size + 2 toppings
AssertionError: 14 != 13

----------------------------------------------------------------------
Ran 4 tests in 0.007s

FAILED (failures=1)
<unittest.runner.TextTestResult run=4 errors=0 failures=1>

The program calls the TextTestRunner() method, which returns a runner (TextTestResult). It says one failure occurred: the statement self.assertEqual(price, 13) was incorrect, as it should have been 14. How can we correct that part of the test? Update that part of the code to the following:

import unittest


# Fixing the test_check_price method
class TestCakeFactory(unittest.TestCase):
 # ... Other tests remain the same

 def test_check_price(self):
     cake = CakeFactory("vanilla", "large")
     cake.add_topping("sprinkles")
     cake.add_topping("cherries")
     price = cake.check_price()
     self.assertEqual(price, 14) # Vanilla cake, large size + 2 toppings

# Re-running the unittests
unittest.TextTestRunner().run(unittest.TestLoader().loadTestsFromTestCase(TestCakeFactory))

And now the program works as expected, as the results provide no failures and are:

.
----------------------------------------------------------------------
Ran 4 test in 0.002s

OK

Regular Expression

Basic Regex

Character types
import re
re.findall("\w", "h32rb17")

import re
re.findall("\d", "h32rb17")
邊界符號
Quantify occurrences

次數符號,限定符號

Functions

.findall()

.findall(<regex>, <string>) 

import re
re.findall("\d+", "h32rb17")

import re
re.findall("\d*", "h32rb17")

import re
re.findall("\d{2}", "h32rb17 k825t0m c2994eh")

import re
re.findall("\d{1,3}", "h32rb17 k825t0m c2994eh")
import re
pattern = "\w+:\s\d+"
employee_logins_string = "1001 bmoreno: 12 Marketing 1002 tshah: 7 Human Resources 1003 sgilmore: 5 Finance"
print(re.findall(pattern, employee_logins_string))
['bmoreno: 12', 'tshah: 7', 'sgilmore: 5']
.search()

.search(<regex>, <string>, re.IGNORECASE) 

import re
log = "July 31 07:51:48 mycomputer bad_process[12345]: ERROR Performing package upgrade"
regex = r"\[(\d+)\]"
result = re.search(regex, log)

print(result)     # Output: <_sre.SRE_Match object; span=(39, 46), match='[12345]'>
print(result[1])  # Output: 12345
import re
print(re.search(r"[Pp]ython", "Python"))

# Output: <_sre.SRE_Match object; span=(0, 6), match='Python'>
import re
print(re.search(r"Py.*n", "Pygmalion")) 
print(re.search(r"Py.*n", "Python Programming"))
print(re.search(r"Py[a-z]*n", "Python Programming"))
print(re.search(r"Py[a-z]*n", "Pyn"))

# Output:
# <_sre.SRE_Match object; span=(0, 9), match='Pygmalion'>
# <_sre.SRE_Match object; span=(0, 17), match='Python Programmin'>
# <_sre.SRE_Match object; span=(0, 6), match='Python'>
# <_sre.SRE_Match object; span=(0, 3), match='Pyn'>
import re
print(re.search(r"o+l+", "goldfish"))
print(re.search(r"o+l+", "woolly"))
print(re.search(r"o+l+", "boil"))

# Output:
# <_sre.SRE_Match object; span=(1, 3), match='ol'>
# <_sre.SRE_Match object; span=(1, 5), match='ooll'>
# None
.split()
import re
re.split(r"[.?!]", "One sentence. Another one? And the last one!")

# Output: ['One sentence', ' Another one', ' And the last one', '']
re.split(r"the|a", "One sentence. Another one? And the last one!")

# Output: ['One sentence. Ano', 'r one? And ', ' l', 'st one!']
import re
re.split(r"([.?!])", "One sentence. Another one? And the last one!")

# Output: ['One sentence', '.', ' Another one', '?', ' And the last one', '!', '']
.sub()
import re
re.sub(r"[\w.%+-]+@[\w.-]+", "[REDACTED]", "Received an email for go_nuts95@my.example.com")

# Output: Received an email for [REDACTED]
re.sub(r"([A-Z])\.\s+(\w+)", r"Ms. \2", "A. Weber and B. Bellmas have joined the team.")

# Output: Ms. Weber and Ms. Bellmas have joined the team
import re
re.sub(r"^([\w .-]*), ([\w .-]*)$", r"\2 \1", "Lovelace, Ada")

# Output: Ada Lovelace

Advanced Regex

多個選項

Alteration: RegEx that matches any one of the alternatives separated by the pipe symbol

字元範圍

常用驗證

IP addr.
# Assign `log_file` to a string containing username, date, login time, and IP address for a series of login attempts 
log_file = "eraab 2022-05-10 6:03:41 192.168.152.148 \niuduike 2022-05-09 6:46:40 192.168.22.115 \nsmartell 2022-05-09 19:30:32 192.168.190.178 \narutley 2022-05-12 17:00:59 1923.1689.3.24 \nrjensen 2022-05-11 0:59:26 192.168.213.128 \naestrada 2022-05-09 19:28:12 1924.1680.27.57 \nasundara 2022-05-11 18:38:07 192.168.96.200 \ndkot 2022-05-12 10:52:00 1921.168.1283.75 \nabernard 2022-05-12 23:38:46 19245.168.2345.49 \ncjackson 2022-05-12 19:36:42 192.168.247.153 \njclark 2022-05-10 10:48:02 192.168.174.117 \nalevitsk 2022-05-08 12:09:10 192.16874.1390.176 \njrafael 2022-05-10 22:40:01 192.168.148.115 \nyappiah 2022-05-12 10:37:22 192.168.103.10654 \ndaquino 2022-05-08 7:02:35 192.168.168.144"

# Assign `pattern` to a regular expression that matches with all valid IP addresses and only those 
pattern = "\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"

# Use `re.findall()` on `pattern` and `log_file` and assign `valid_ip_addresses` to the output 
valid_ip_addresses = re.findall(pattern, log_file)

# Assign `flagged_addresses` to a list of IP addresses that have been previously flagged for unusual activity
flagged_addresses = ["192.168.190.178", "192.168.96.200", "192.168.174.117", "192.168.168.144"]

# Iterative statement begins here
# Loop through `valid_ip_addresses` with `address` as the loop variable
for address in valid_ip_addresses:

    # Conditional begins here
    # If `address` belongs to `flagged_addresses`, display "The IP address ______ has been flagged for further analysis."
    if address in flagged_addresses:
        print("The IP address", address, "has been flagged for further analysis.")

    # Otherwise, display "The IP address ______ does not require further analysis."
    else:
        print("The IP address", address, "does not require further analysis.")
檢查字串函式

回傳結果 True 或 False

import re
def check_aei (text):
  result = re.search(r".*a.+e.+i.*", text)
  return result != None

print(check_aei("academia")) # True
print(check_aei("aerial")) # False
print(check_aei("paramedic")) # True

函式: 檢查字串是否有包含任何標點符號

import re
def check_punctuation (text):
  result = re.search(r"[^a-zA-Z ]", text)
  return result != None

print(check_punctuation("This is a sentence that ends with a period.")) # True
print(check_punctuation("This is a sentence fragment without a period")) # False
print(check_punctuation("Aren't regular expressions awesome?")) # True
import re
def compare_strings(string1, string2):
  # Convert both strings to lowercase
  # and remove leading and trailing blanks
  string1 = string1.lower().strip()
  string2 = string2.lower().strip()

  # Removed punctuation
  punctuation = r"[.?!,;:\-']"

  string1 = re.sub(punctuation, r"", string1)
  string2 = re.sub(punctuation, r"", string2)

  # DEBUG CODE GOES HERE
  #print(string1 == string2)
  return string1 == string2

print(compare_strings("Have a Great Day!", "Have a great day?")) # True
print(compare_strings("It's raining again.", "its raining, again")) # True
print(compare_strings("Learn to count: 1, 2, 3.", "Learn to count: one, two, three.")) # False
print(compare_strings("They found some body.", "They found somebody.")) # False

函式:check web address

import re
def check_web_address(text):
  pattern = r"[\w-]*\.[a-zA-Z]*$"
  result = re.search(pattern, text)
  return result != None

print(check_web_address("gmail.com")) # True
print(check_web_address("www@google")) # False
print(check_web_address("www.Coursera.org")) # True
print(check_web_address("web-address.com/homepage")) # False
print(check_web_address("My_Favorite-Blog.US")) # True

函式:check time

import re
def check_time(text):
  pattern = r"[1-9|10|11|12]:[0-5][0-9] *[AaPp][mM]$"
  result = re.search(pattern, text)
  return result != None

print(check_time("12:45pm")) # True
print(check_time("9:59 AM")) # True
print(check_time("6:60am")) # False
print(check_time("five o'clock")) # False
print(check_time("6:02 am")) # True
print(check_time("6:02km")) # False

函式:括號內的字首需大寫字母或數字

import re
def contains_acronym(text):
  pattern = r"\([0-9A-Z][a-zA-z]*\)" 
  result = re.search(pattern, text)
  return result != None

print(contains_acronym("Instant messaging (IM) is a set of communication technologies used for text-based communication")) # True
print(contains_acronym("American Standard Code for Information Interchange (ASCII) is a character encoding standard for electronic communication")) # True
print(contains_acronym("Please do NOT enter without permission!")) # False
print(contains_acronym("PostScript is a fourth-generation programming language (4GL)")) # True
print(contains_acronym("Have fun using a self-contained underwater breathing apparatus (Scuba)!")) # True

函式:Log 提取 PID 與 Message

import re
def extract_pid(log_line):
    regex = r"\[(\d+)\]: ([A-Z]*) "
    result = re.search(regex, log_line)
    if result is None:
        return None
    return "{} ({})".format(result[1], result[2])

print(extract_pid("July 31 07:51:48 mycomputer bad_process[12345]: ERROR Performing package upgrade")) # 12345 (ERROR)
print(extract_pid("99 elephants in a [cage]")) # None
print(extract_pid("A string that also has numbers [34567] but no uppercase message")) # None
print(extract_pid("July 31 08:08:08 mycomputer new_process[67890]: RUNNING Performing backup")) # 67890 (RUNNING)

函式:轉換電話號碼

import re
def transform_record(record):
  new_record = re.sub(r"(.*,)(\d{3}-[\d-]+)(,.*)", r"\1+1-\2\3", record)
  return new_record

print(transform_record("Sabrina Green,802-867-5309,System Administrator")) 
# Sabrina Green,+1-802-867-5309,System Administrator

print(transform_record("Eli Jones,684-3481127,IT specialist")) 
# Eli Jones,+1-684-3481127,IT specialist

print(transform_record("Melody Daniels,846-687-7436,Programmer")) 
# Melody Daniels,+1-846-687-7436,Programmer

print(transform_record("Charlie Rivera,698-746-3357,Web Developer")) 
# Charlie Rivera,+1-698-746-3357,Web Developer
import re
def convert_phone_number(phone):
  result = re.sub(r"([\w ]+)(\d{3})-(\d{3}-\d{4}.*)$", r"\1(\2) \3", phone)
  return result

print(convert_phone_number("My number is 212-345-9999.")) # My number is (212) 345-9999.
print(convert_phone_number("Please call 888-555-1234")) # Please call (888) 555-1234
print(convert_phone_number("123-123-12345")) # 123-123-12345
print(convert_phone_number("Phone number of Buckingham Palace is +44 303 123 7300")) # Phone number of Buckingham Palace is +44 303 123 7300
# phone.csv:
#123-456-7890
#(123) 456-7890
#1234567890
#

import re

with open("data/phones.csv", "r") as phones:
  for phone in phones:
  new_phone = re.sub(r"^\D*(\d{3})\D*(\d{3})\D*(\d{4})$", r"(\1) \2-\3", phone)
  print(new_phone)

# Output
#(123) 456-7890
#(123) 456-7890
#(123) 456-7890

函式:包含 a, e, i, o, u  任一字元 3 個以上的單字

import re
def multi_vowel_words(text):
  pattern = r"\w+[aeiou]{3,}\w+"
  result = re.findall(pattern, text)
  return result

print(multi_vowel_words("Life is beautiful")) 
# ['beautiful']

print(multi_vowel_words("Obviously, the queen is courageous and gracious.")) 
# ['Obviously', 'queen', 'courageous', 'gracious']

print(multi_vowel_words("The rambunctious children had to sit quietly and await their delicious dinner.")) 
# ['rambunctious', 'quietly', 'delicious']

print(multi_vowel_words("The order of a data queue is First In First Out (FIFO)")) 
# ['queue']

print(multi_vowel_words("Hello world!")) 
# []

\b 的用法

\b 必須是單詞 (文字、數字、底線) 的開頭或結尾

import re
print(re.findall(r"[a-zA-Z]{5}", "a scary ghost appeared"))

# Output: ['scary', 'ghost', 'appea']

import re
re.findall(r"\b[a-zA-Z]{5}\b", "A scary ghost appeared")

# Output: ['scary', 'ghost']
def find_eid(report):
  pattern = r"[A-Z]-[\d]{7,8}\b" #enter the regex pattern here
  result = re.findall(pattern, report) #enter the re method  here
  return result


print(find_eid("Employees B-1234567 and C-12345678 worked with products X-123456 and Z-123456789")) 
# Should return ['B-1234567', 'C-12345678']
print(find_eid("Employees B-1234567 and C-12345678, not employees b-1234567 and c-12345678")) 
#Should return ['B-1234567', 'C-12345678']  

Capturing Groups

import re
result = re.search(r"^(\w*), (\w*)$", "Lovelace, Ada")
print(result)
print(result.groups())
print(result[0])
print(result[1])
print(result[2])
"{} {}".format(result[2], result[1])

# Output
# <_sre.SRE_Match object; span=(0, 13), match='Lovelace, Ada'>
# ('Lovelace', 'Ada')
# Lovelace, Ada
# Lovelace
# Ada
# Ada Lovelace

Resources

Tuple 元組

元組類似於清單,是任何類型的元素序列,但它們是不可變的,它們以括號表示。

a = (1, 2, 3)
b = ('red', 'green', 'blue')

範例:利用 index 取值

t = (1, 2, 3 ,4 ,5)
print(t[0])  # 1
print(t[1])  # 2
print(t[2])  # 3

範例:如果函式一次回傳多個值時,這資料類型就是 Tuple。

def convert_seconds(seconds):
  hours = seconds // 3600
  minutes = (seconds - hours * 3600) // 60
  remaining_seconds = seconds - hours * 3600 - minutes * 60
  return hours, minutes, remaining_seconds
result = convert_seconds(5000)
type(result)

# Output: <class 'tuple'>

範例:Tuple 可以將多個不同值對應不同變數名

def convert_seconds(seconds):
  hours = seconds // 3600
  minutes = (seconds - hours * 3600) // 60
  remaining_seconds = seconds - hours * 3600 - minutes * 60
  return hours, minutes, remaining_seconds
result = convert_seconds(5000)
hours, minutes, seconds = result
print(hours, minutes, seconds)

# Output: 1 23 20

您可能會想,既然元組和清單類似,為什麼會有元組呢?當我們需要確保某個元素在某個位置且不會改變時,Tuples 就會很有用。由於 List(清單) 是可變的,因此元素的順序可以被改變。由於 Tuple(元組) 中元素的順序無法改變,元素在 Tuple(元組)中的位置就有了意義。一個很好的例子就是當一個函式回傳多個值時。在這種情況下,返回的是一個 Tuple(元組) 中的元素。返回值的順序很重要,而一個 Tuple(元組)可以確保順序不會改變。將 Tuple 的元素儲存於獨立的變數中,稱為 unpacking。這允許您從函數中取得多個回傳值,並將每個值儲存在自己的變數中。

範例:迭代於 List 與 Tuple

def full_emails(people):
  result = []
  for email, name in people:
    result.append("{} <{}>".format(name, email))
  return result
print(full_emails([("alex@example.com", "Alex Diego"), ("shay@example.com", "Shay Brandt")]))

# Output: ['Alex Diego <alex@example.com>', 'Shay Brandt <shay@example.com>']


Dictionary 字典

不像序列是由一個範圍內的數字當作索引,dictionary 是由鍵 (key) 來當索引,鍵可以是任何不可變的類型;字串和數字都可以當作鍵。Tuple 也可以當作鍵,如果他們只含有字串、數字或 tuple;若一個 tuple 直接或間接地含有任何可變的物件,它就不能當作鍵。你無法使用 list 當作鍵,因為 list 可以經由索引指派 (index assignment)、切片指派 (slice assignment) 或是像 append() 和 extend() 等 method 被修改。

思考 dictionary 最好的方式是把它想成是一組鍵值對 (key: value pair) 的 set,其中鍵在同一個 dictionary 裡必須是獨一無二的。使用一對大括號可建立一個空的 dictionary:{}。將一串由逗號分隔的鍵值對置於大括號則可初始化字典的鍵值對。這同樣也是字典輸出時的格式。

Key type:

資料集合

dictionary1 = {"keyA":valuea, "keyB":value2, "keyC":value3, "KeyD":value4}

dictionary2 = {"keyA":["value1", "value2"], "keyB":["value3", "value4"]}

搜尋鍵-值

NOTE: Dictionary 如果鍵有重複,新的值會覆蓋舊的。

file_counts = {"jpg":10, "txt":14, "csv":2, "py":23}
file_counts["txt"]
# Output: 14

# 鍵有重複時
file_counts = {"jpg":10, "txt":14, "csv":2, "py":23, "txt":99}
file_counts["txt"]
# Output: 99

檢查索引

file_counts = {"jpg":10, "txt":14, "csv":2, "py":23}
"jpg" in file_counts
# Output: True

新增元素: dictionary[key] = value

file_counts = {"jpg":10, "txt":14, "csv":2, "py":23}
file_counts["cfg"] = 8
print(file_counts)
# Output {'jpg': 10, 'txt': 14, 'csv': 2, 'py': 23, 'cfg': 8}

變更指定索引的元素: dictionary[key] = value

file_counts = {"jpg":10, "txt":14, "csv":2, "py":23}
file_counts["csv"] = 17
print(file_counts)
# Output {'jpg': 10, 'txt': 14, 'csv': 17, 'py': 23}

刪除指定索引的元素

file_counts = {"jpg":10, "txt":14, "csv":2, "py":23, 'cfg':8}
del file_counts["cfg"]
print(file_counts)
# Output {'jpg': 10, 'txt': 14, 'csv': 2, 'py': 23}

Operations

字典使用 for loop 迭代時,預設使用 key 存取 

file_counts = {"jpg":10, "txt":14, "csv":2, "py":23}
for extension in file_counts:
  print(extension)

# Output
jpg
txt
csv
py

Methods

.item()

.items()  迭代 dictionary 資料時,可存取 key 與 value。

file_counts = {"jpg":10, "txt":14, "csv":2, "py":23}
for ext, amount in file_counts.items():
  print("There are {} files with the .{} extension".format(amount, ext))

# Output
There are 10 files with the .jpg extension
There are 14 files with the .txt extension
There are 2 files with the .csv extension
There are 23 files with the .py extension
# This function returns the total time, with minutes represented as 
# decimals (example: 1 hour 30 minutes = 1.5), for all end user time
# spent accessing a server in a given day. 


def sum_server_use_time(Server):

    # Initialize the variable as a float data type, which will be used
    # to hold the sum of the total hours and minutes of server usage by
    # end users in a day.
    total_use_time = 0.0

    # Iterate through the "Server" dictionary’s key and value items 
    # using a for loop.
    for key,value in Server.items():

        # For each end user key, add the associated time value to the
        # total sum of all end user use time.
        total_use_time += Server[key]
        
    # Round the return value and limit to 2 decimal places.
    return round(total_use_time, 2)  

FileServer = {"EndUser1": 2.25, "EndUser2": 4.5, "EndUser3": 1, "EndUser4": 3.75, "EndUser5": 0.6, "EndUser6": 8}

print(sum_server_use_time(FileServer)) # Should print 20.1
# This function receives a dictionary, which contains common employee 
# last names as keys, and a list of employee first names as values. 
# The function generates a new list that contains each employees’ full
# name (First_name Last_Name). For example, the key "Garcia" with the 
# values ["Maria", "Hugo", "Lucia"] should be converted to a list 
# that contains ["Maria Garcia", "Hugo Garcia", "Lucia Garcia"].


def list_full_names(employee_dictionary):
    # Initialize the "full_names" variable as a list data type using
    # empty [] square brackets.  
    full_names = []

    # The outer for loop iterates through each "last_name" key and 
    # associated "first_name" values, in the "employee_dictionary" items.
    for last_name, first_names in employee_dictionary.items():

        # The inner for loop iterates over each "first_name" value in 
        # the list of "first_names" for one "last_name" key at a time.
        for first_name in first_names:

            # Append the new "full_names" list with the "first_name" value
            # concatenated with a space " ", and the key "last_name". 
            full_names.append(first_name+" "+last_name)
            
    # Return the new "full_names" list once the outer for loop has 
    # completed all iterations. 
    return(full_names)


print(list_full_names({"Ali": ["Muhammad", "Amir", "Malik"], "Devi": ["Ram", "Amaira"], "Chen": ["Feng", "Li"]}))
# Should print ['Muhammad Ali', 'Amir Ali', 'Malik Ali', 'Ram Devi', 'Amaira Devi', 'Feng Chen', 'Li Chen']
.keys() .values()

.keys() , .values() 

file_counts = {"jpg":10, "txt":14, "csv":2, "py":23}
file_counts.keys()   # Return dict_keys(['jpg', 'txt', 'csv', 'py'])
file_counts.values() # Return dict_values([10, 14, 2, 23])
file_counts = {"jpg":10, "txt":14, "csv":2, "py":23}
for value in file_counts.values():
  print(value)

# Output
10
14
2
23
def groups_per_user(group_dictionary):
	user_groups = {}
	# Go through group_dictionary
	for group, users in group_dictionary.items():
		# Now go through the users in the group
		for user in users:
			# Now add the group to the the list of
			if user in user_groups:
				user_groups[user].append(group)
			else:
				user_groups[user] = [group]

# groups for this user, creating the entry
# in the dictionary if necessary

	return(user_groups)

print(groups_per_user({"local": ["admin", "userA"],
		"public":  ["admin", "userB"],
		"administrator": ["admin"] }))

# Should print {'admin': ['local', 'public', 'administrator'], 'userA': ['local'], 'userB': ['public']}
.update()
wardrobe = {'shirt': ['red', 'blue', 'white'], 'jeans': ['blue', 'black']}
new_items = {'jeans': ['white'], 'scarf': ['yellow'], 'socks': ['black', 'brown']}
wardrobe.update(new_items)

# wardrobe: {'shirt': ['red', 'blue', 'white'], 'jeans': ['white'], 'scarf': ['yellow'], 'socks': ['black', 'brown']}
.copy() 
# The scores() function accepts a dictionary "game_scores" as a parameter.
def reset_scores(game_scores):

    # The .copy() dictionary method is used to create a new copy of the "game_scores".
    new_game_scores = game_scores.copy() 

    # The for loop iterates over new_game_scores items, with the player as the key
    # and the score as the value. 
    for player, score in new_game_scores.items():
    
        # The dictionary operation to assign a new value to a key is used
        # to reset the grade values to 0.
        new_game_scores[player] = 0
  
    return new_game_scores
 
# The dictionary is defined.
game1_scores = {"Arshi": 3, "Catalina": 7, "Diego": 6}
 
# Call the "reset_scores" function with the "game1_scores" dictionary. 
print(reset_scores(game1_scores))
# Should print {'Arshi': 0, 'Catalina': 0, 'Diego': 0}

Functions

sorted()
fruit = {"oranges": 3, "apples": 5, "bananas": 7, "pears": 2}

sorted(fruit.items())
# [('apples', 5), ('bananas', 7), ('oranges', 3), ('pears', 2)]

import operator
sorted(fruit.items(), key=operator.itemgetter(0))
# [('apples', 5), ('bananas', 7), ('oranges', 3), ('pears', 2)]

sorted(fruit.items(), key=operator.itemgetter(1))
# [('pears', 2), ('oranges', 3), ('apples', 5), ('bananas', 7)]

sorted(fruit.items(), key = operator.itemgetter(1), reverse=True)
# [('bananas', 7), ('apples', 5), ('oranges', 3), ('pears', 2)]

Google Python Course

Google Python 訓練課程

Google Python Course

Course 1

Naming rules and conventions

命名規則與慣例

When assigning names to objects, programmers adhere to a set of rules and conventions which help to standardize code and make it more accessible to everyone. Here are some naming rules and conventions that you should know:

Common syntax errors

Annotating variables by type

註解變數的資料類型

This has several benefits: It reduces the chance of common mistakes, helps in documenting your code for others to reuse, and allows integrated development software (IDEs) and other tools to give you better feedback.

How to annotate a variable:

a = 3                  #a is an integer
captain = "Picard"     # type: str
captain: str = “Picard”

import typing
# Define a variable of type str
z: str = "Hello, world!"
# Define a variable of type int
x: int = 10
# Define a variable of type float
y: float = 1.23
# Define a variable of type list
list_of_numbers: typing.List[int] = [1, 2, 3]
# Define a variable of type tuple
tuple_of_numbers: typing.Tuple[int, int, int] = (1, 2, 3)
# Define a variable of type dict
dictionary: typing.Dict[str, int] = {"key1": 1, "key2": 2}
# Define a variable of type set
set_of_numbers: typing.Set[int] = {1, 2, 3}

Data type conversions

Implicit vs explicit conversion 隱式 vs 顯式轉換

Implicit conversion is where the interpreter helps us out and automatically converts one data type into another, without having to explicitly tell it to do so.

Example:

# Converting integer into a float
print(7+8.5)

Explicit conversion is where we manually convert from one data type to another by calling the relevant function for the data type we want to convert to.

We used this in our video example when we wanted to print a number alongside some text. Before we could do that, we needed to call the str() function to convert the number into a string.

Example:

# Convert a number into a string
base = 6
height = 3
area = (base*height)/2
print("The area of the triangle is: " + str(area)) 

Operators

Arithmetic operators

Example for // & %

# even: 偶數
def is_even(number):
    if number % 2 == 0:
        return True
    return False
#This code has no ouput
def calculate_storage(filesize):
    block_size = 4096
    # Use floor division to calculate how many blocks are fully occupied
    full_blocks = filesize // block_size
    # Use the modulo operator to check whether there's any remainder
    partial_block_remainder = filesize % block_size
    # Depending on whether there's a remainder or not, return
    # the total number of bytes required to allocate enough blocks
    # to store your data.
    if partial_block_remainder > 0:
        return (full_blocks + 1) * block_size
    return full_blocks * block_size

print(calculate_storage(1))    # Should be 4096
print(calculate_storage(4096)) # Should be 4096
print(calculate_storage(4097)) # Should be 8192
print(calculate_storage(6000)) # Should be 8192
Comparison operators

Symbol

Name

Expression

Description

==

Equality operator

a == b

a is equal to b

!=

Not equal to operator

a != b

a is not equal to b

>

Greater than operator 

a > b

a is larger than b

>=

Greater than or equal to operator 

a >= b

a is larger than or equal to b

<

Less than operator 

a < b

a is smaller than b

<=

Less than or equal to operator 

a <= b

a is smaller than or equal to b

Good coding style

Loops

While Loops
multiplier = 1
result = multiplier * 5
while result <= 50:
    print(result)
    multiplier += 1
    result = multiplier * 5
print("Done")

Common errors in Loops

For Loops
friends = ['Taylor', 'Alex', 'Pat', 'Eli']
for friend in friends:
    print("Hi " + friend)
# °F to ℃
def to_celsius(x):
  return (x-32)*5/9

for x in range(0,101,10):
  print(x, to_celsius(x))
for number in range(1, 6+1, 2):
    print(number * 3)

# The loop should print 3, 9, 15
Nested for Loops

嵌入式 for 迴圈

# home_team 主隊, away_team 客隊
teams = [ 'Dragons', 'Wolves', 'Pandas', 'Unicorns']
for home_team in teams:
  for away_team in teams:
    if home_team != away_team:
      print(home_team + " vs " + away_team)
List comprehensions

列表生成式: [x for x in sequence if condition] 

# with for loop
numbers = [1, 2, 3, 4, 5]
squared_numbers = [x ** 2 for x in numbers]
print(squared_numbers)
# with for loop and if
sequence = range(10)
new_list = [x for x in sequence if x % 2 == 0]
Recursive function

遞歸函式 Use cases

  1. Goes through a bunch of directories in your computer and calculates how many files are contained in each.
  2. Review groups in Active Directory.
'''
def recursive_function(parameters):
    if base_case_condition(parameters):
        return base_case_value
    recursive_function(modified_parameters)
'''
def factorial(n):
  if n < 2:
    return 1
  return n * factorial(n-1)
def factorial(n):
  print("Factorial called with " + str(n))
  if n < 2:
    print("Returning 1")
    return 1
  result = n * factorial(n-1)
  print("Returning " + str(result) + " for factorial of " + str(n))
  return result

factorial(4)

Types of iterables

Resources

Naming rules and conventions

Annotating variables by type

Google Python Course

Dictionaries vs. Lists

Dictionaries are similar to lists, but there are a few differences:

Both dictionaries and lists:

Dictionaries only:

pet_dictionary = {"dogs": ["Yorkie", "Collie", "Bulldog"], "cats": ["Persian", "Scottish Fold", "Siberian"], "rabbits": ["Angora", "Holland Lop", "Harlequin"]}  


print(pet_dictionary.get("dogs", 0))
# Should print ['Yorkie', 'Collie', 'Bulldog']

Lists only:

pet_list  = ["Yorkie", "Collie", "Bulldog", "Persian", "Scottish Fold", "Siberian", "Angora", "Holland Lop", "Harlequin"]


print(pet_list[0:3])
# Should print ['Yorkie', 'Collie', 'Bulldog']

 

Google Python Course

Classes and methods

Defining classes and methods

class ClassName:
    def method_name(self, other_parameters):
        body_of_method
Special methods

With the __init__ method:

用途:接受參數的傳入,並帶入變數 self.XXX

class Apple:
    def __init__(self, color, flavor):
        self.color = color
        self.flavor = flavor

honeycrisp = Apple("red", "sweet")
fuji = Apple("red", "tart")
print(honeycrisp.flavor)
print(fuji.flavor)

With the __str__ method:

When you print() something, Python calls the object’s __str__() method and outputs whatever that method returns

class Apple:
    def __init__(self, color, flavor):
        self.color = color
        self.flavor = flavor

    def __str__(self):
        return "an apple which is {} and {}".format(self.color, self.flavor)

honeycrisp = Apple("red", "sweet")
print(honeycrisp)

# prints "an apple which is red and sweet"

With the custom method

class Triangle:
    def __init__(self, base, height):
        self.base = base
        self.height = height
    def area(self):
        return 0.5 * self.base * self.height
    def __add__(self, other):
        return self.area() + other.area()
    
triangle1 = Triangle(10, 5)
triangle2 = Triangle(6, 8)
print("The area of triangle 1 is", triangle1.area())
print("The area of triangle 2 is", triangle2.area())
print("The area of both triangles is", triangle1 + triangle2)

Google Python Course

Examples

登入紀錄報告

def get_event_date(event):
  return event.date

def current_users(events):
  events.sort(key=get_event_date)
  machines = {}
  for event in events:
    if event.machine not in machines:
      machines[event.machine] = set()
    if event.type == "login":
      machines[event.machine].add(event.user)
    elif event.type == "logout":
      machines[event.machine].remove(event.user)
  return machines

def generate_report(machines):
  for machine, users in machines.items():
    if len(users) > 0:
      user_list = ", ".join(users)
      print("{}: {}".format(machine, user_list))

class Event:
  def __init__(self, event_date, event_type, machine_name, user):
    self.date = event_date
    self.type = event_type
    self.machine = machine_name
    self.user = user

events = [
  Event('2020-01-21 12:45:46', 'login', 'myworkstation.local', 'jordan'),
  Event('2020-01-22 15:53:42', 'logout', 'webserver.local', 'jordan'),
  Event('2020-01-21 18:53:21', 'login', 'webserver.local', 'lane'),
  Event('2020-01-22 10:25:34', 'logout', 'myworkstation.local', 'jordan'),
  Event('2020-01-21 08:20:01', 'login', 'webserver.local', 'jordan'),
  Event('2020-01-23 11:24:35', 'login', 'mailserver.local', 'chris'),
]

users = current_users(events)
print(users)
# Output: {'webserver.local': {'lane'}, 'myworkstation.local': set(), 'mailserver.local': {'chris'}}

generate_report(users)
# Output:
# webserver.local: lane
# mailserver.local: chris

分析 Syslog

import re
import sys

logfile = sys.argv[1]
usernames = {}
with open(logfile) as f:
  for line in f:
    if "CRON" not in line:
      continue
    pattern = r"USER \((\w+)\)$"
    result = re.search(pattern, line)

    if result is None:
      continue
    name = result[1]
    usernames[name] = usernames.get(name, 0) + 1

print(usernames)

進階版

fishy.log:

July 31 02:25:52 mycomputername system[41921]: WARN Failed to start CPU thread[39016]
July 31 02:34:37 mycomputername kernel[32280]: INFO Loading...
July 31 02:36:44 mycomputername NetworkManager[90289]: WARN Failed to start CPU thread[39016]
July 31 02:39:01 mycomputername CRON[89330]: ERROR Unable to perform package upgrade
July 31 02:45:39 mycomputername utility[57387]: INFO Access permitted
July 31 02:58:44 mycomputername process[44707]: WARN Computer needs to be turned off and on again
July 31 02:59:35 mycomputername system[55024]: WARN Packet loss
July 31 03:09:30 mycomputername kernel[40705]: ERROR The cake is a lie!
July 31 03:23:16 mycomputername cacheclient[57185]: INFO Checking process [16121]
July 31 03:26:56 mycomputername cacheclient[90154]: INFO Healthy resource usage
July 31 03:28:52 mycomputername CRON[55441]: INFO Loading...
July 31 03:29:34 mycomputername dhcpclient[69232]: ERROR Unable to download more RAM
July 31 03:34:41 mycomputername NetworkManager[14120]: ERROR 404 error not found
July 31 03:36:26 mycomputername dhcpclient[79731]: ERROR The cake is a lie!
July 31 03:38:24 mycomputername CRON[92141]: INFO Access permitted
July 31 03:40:00 mycomputername dhcpclient[40114]: INFO Starting sync
July 31 03:42:45 mycomputername utility[53726]: INFO I'm sorry Dave. I'm afraid I can't do that
July 31 03:47:07 mycomputername NetworkManager[63805]: WARN Please reboot user
July 31 04:09:16 mycomputername CRON[52593]: WARN PC Load Letter
July 31 04:11:32 mycomputername CRON[51253]: ERROR: Failed to start CRON job due to script syntax error. Inform the CRON job owner!
July 31 04:11:32 mycomputername jam_tag=psim[84082]: ERROR ID: 10t
July 31 04:12:05 mycomputername utility[63418]: INFO Successfully connected
July 31 04:14:22 mycomputername utility[53225]: ERROR I am error
July 31 04:31:00 mycomputername NetworkManager[23060]: ERROR Out of yellow ink, specifically, even though you want grayscale
find_error.py

Usage: ./find_error.py fishy.log 

import sys
import os
import re

def error_search(log_file):
    error = input("What is the error? ")
    returned_errors = []

    with open(log_file, mode='r', encoding='UTF-8') as file:
        for log in file.readlines():
            error_patterns = ["error"]
            for i in range(len(error.split(' '))):
                error_patterns.append(r"{}".format(error.split(' ')[i].lower()))

            if all(re.search(error_pattern, log.lower()) for error_pattern in error_patterns):
                returned_errors.append(log)

        file.close()
    return returned_errors

def file_output(returned_errors):
    with open(os.path.expanduser('~') + '/data/errors_found.log', 'w') as file:
        for error in returned_errors:
            file.write(error)

    file.close()

if __name__ == "__main__":
    log_file = sys.argv[1]
    returned_errors = error_search(log_file)
    file_output(returned_errors)
    sys.exit(0)

分析 Syslog 2

syslog.log :

Jan 31 00:09:39 ubuntu.local ticky: INFO Created ticket [#4217] (mdouglas)
Jan 31 00:16:25 ubuntu.local ticky: INFO Closed ticket [#1754] (noel)
Jan 31 00:21:30 ubuntu.local ticky: ERROR The ticket was modified while updating (breee)
Jan 31 00:44:34 ubuntu.local ticky: ERROR Permission denied while closing ticket (ac)
Jan 31 01:00:50 ubuntu.local ticky: INFO Commented on ticket [#4709] (blossom)
Jan 31 01:29:16 ubuntu.local ticky: INFO Commented on ticket [#6518] (rr.robinson)
Jan 31 01:33:12 ubuntu.local ticky: ERROR Tried to add information to closed ticket (mcintosh)
Jan 31 01:43:10 ubuntu.local ticky: ERROR Tried to add information to closed ticket (jackowens)
Jan 31 01:49:29 ubuntu.local ticky: ERROR Tried to add information to closed ticket (mdouglas)
Jan 31 02:30:04 ubuntu.local ticky: ERROR Timeout while retrieving information (oren)
Jan 31 02:55:31 ubuntu.local ticky: ERROR Ticket doesn't exist (xlg)
Jan 31 03:05:35 ubuntu.local ticky: ERROR Timeout while retrieving information (ahmed.miller)
Jan 31 03:08:55 ubuntu.local ticky: ERROR Ticket doesn't exist (blossom)
Jan 31 03:39:27 ubuntu.local ticky: ERROR The ticket was modified while updating (bpacheco)
Jan 31 03:47:24 ubuntu.local ticky: ERROR Ticket doesn't exist (enim.non)
Jan 31 04:30:04 ubuntu.local ticky: ERROR Permission denied while closing ticket (rr.robinson)
Jan 31 04:31:49 ubuntu.local ticky: ERROR Tried to add information to closed ticket (oren)
Jan 31 04:32:49 ubuntu.local ticky: ERROR Timeout while retrieving information (mcintosh)
Jan 31 04:44:23 ubuntu.local ticky: ERROR Timeout while retrieving information (ahmed.miller)
Jan 31 04:44:46 ubuntu.local ticky: ERROR Connection to DB failed (jackowens)
Jan 31 04:49:28 ubuntu.local ticky: ERROR Permission denied while closing ticket (flavia)
Jan 31 05:12:39 ubuntu.local ticky: ERROR Tried to add information to closed ticket (oren)
Jan 31 05:18:45 ubuntu.local ticky: ERROR Tried to add information to closed ticket (sri)
Jan 31 05:23:14 ubuntu.local ticky: INFO Commented on ticket [#1097] (breee)
Jan 31 05:35:00 ubuntu.local ticky: ERROR Connection to DB failed (nonummy)
Jan 31 05:45:30 ubuntu.local ticky: INFO Created ticket [#7115] (noel)
Jan 31 05:51:30 ubuntu.local ticky: ERROR The ticket was modified while updating (flavia)
Jan 31 05:57:46 ubuntu.local ticky: INFO Commented on ticket [#2253] (nonummy)
Jan 31 06:12:02 ubuntu.local ticky: ERROR Connection to DB failed (oren)
Jan 31 06:26:38 ubuntu.local ticky: ERROR Timeout while retrieving information (xlg)
Jan 31 06:32:26 ubuntu.local ticky: INFO Created ticket [#7298] (ahmed.miller)
Jan 31 06:36:25 ubuntu.local ticky: ERROR Timeout while retrieving information (flavia)
Jan 31 06:57:00 ubuntu.local ticky: ERROR Connection to DB failed (jackowens)
Jan 31 06:59:57 ubuntu.local ticky: INFO Commented on ticket [#7255] (oren)
Jan 31 07:59:56 ubuntu.local ticky: ERROR Ticket doesn't exist (flavia)
Jan 31 08:01:40 ubuntu.local ticky: ERROR Tried to add information to closed ticket (jackowens)
Jan 31 08:03:19 ubuntu.local ticky: INFO Closed ticket [#1712] (britanni)
Jan 31 08:22:37 ubuntu.local ticky: INFO Created ticket [#2860] (mcintosh)
Jan 31 08:28:07 ubuntu.local ticky: ERROR Timeout while retrieving information (montanap)
Jan 31 08:49:15 ubuntu.local ticky: ERROR Permission denied while closing ticket (britanni)
Jan 31 08:50:50 ubuntu.local ticky: ERROR Permission denied while closing ticket (montanap)
Jan 31 09:04:27 ubuntu.local ticky: ERROR Tried to add information to closed ticket (noel)
Jan 31 09:15:41 ubuntu.local ticky: ERROR Timeout while retrieving information (oren)
Jan 31 09:18:47 ubuntu.local ticky: INFO Commented on ticket [#8385] (mdouglas)
Jan 31 09:28:18 ubuntu.local ticky: INFO Closed ticket [#2452] (jackowens)
Jan 31 09:41:16 ubuntu.local ticky: ERROR Connection to DB failed (ac)
Jan 31 10:11:35 ubuntu.local ticky: ERROR Timeout while retrieving information (blossom)
Jan 31 10:21:36 ubuntu.local ticky: ERROR Permission denied while closing ticket (montanap)
Jan 31 11:04:02 ubuntu.local ticky: ERROR Tried to add information to closed ticket (breee)
Jan 31 11:19:37 ubuntu.local ticky: ERROR Connection to DB failed (sri)
Jan 31 11:22:06 ubuntu.local ticky: ERROR Timeout while retrieving information (montanap)
Jan 31 11:31:34 ubuntu.local ticky: ERROR Permission denied while closing ticket (ahmed.miller)
Jan 31 11:40:25 ubuntu.local ticky: ERROR Connection to DB failed (mai.hendrix)
Jan 31 11:47:07 ubuntu.local ticky: INFO Commented on ticket [#4562] (ac)
Jan 31 11:58:33 ubuntu.local ticky: ERROR Tried to add information to closed ticket (ahmed.miller)
Jan 31 12:00:17 ubuntu.local ticky: INFO Created ticket [#7897] (kirknixon)
Jan 31 12:02:49 ubuntu.local ticky: ERROR Permission denied while closing ticket (mai.hendrix)
Jan 31 12:20:23 ubuntu.local ticky: ERROR Connection to DB failed (kirknixon)
Jan 31 12:20:40 ubuntu.local ticky: ERROR Ticket doesn't exist (flavia)
Jan 31 12:24:32 ubuntu.local ticky: INFO Created ticket [#5784] (sri)
Jan 31 12:50:10 ubuntu.local ticky: ERROR Permission denied while closing ticket (blossom)
Jan 31 12:58:16 ubuntu.local ticky: ERROR Tried to add information to closed ticket (nonummy)
Jan 31 13:08:10 ubuntu.local ticky: INFO Closed ticket [#8685] (rr.robinson)
Jan 31 13:48:45 ubuntu.local ticky: ERROR The ticket was modified while updating (breee)
Jan 31 14:13:00 ubuntu.local ticky: INFO Commented on ticket [#4225] (noel)
Jan 31 14:38:50 ubuntu.local ticky: ERROR The ticket was modified while updating (enim.non)
Jan 31 14:41:18 ubuntu.local ticky: ERROR Timeout while retrieving information (xlg)
Jan 31 14:45:55 ubuntu.local ticky: INFO Closed ticket [#7948] (noel)
Jan 31 14:50:41 ubuntu.local ticky: INFO Commented on ticket [#8628] (noel)
Jan 31 14:56:35 ubuntu.local ticky: ERROR Tried to add information to closed ticket (noel)
Jan 31 15:27:53 ubuntu.local ticky: ERROR Ticket doesn't exist (blossom)
Jan 31 15:28:15 ubuntu.local ticky: ERROR Permission denied while closing ticket (enim.non)
Jan 31 15:44:25 ubuntu.local ticky: INFO Closed ticket [#7333] (enim.non)
Jan 31 16:17:20 ubuntu.local ticky: INFO Commented on ticket [#1653] (noel)
Jan 31 16:19:40 ubuntu.local ticky: ERROR The ticket was modified while updating (mdouglas)
Jan 31 16:24:31 ubuntu.local ticky: INFO Created ticket [#5455] (ac)
Jan 31 16:35:46 ubuntu.local ticky: ERROR Timeout while retrieving information (oren)
Jan 31 16:53:54 ubuntu.local ticky: INFO Commented on ticket [#3813] (mcintosh)
Jan 31 16:54:18 ubuntu.local ticky: ERROR Connection to DB failed (bpacheco)
Jan 31 17:15:47 ubuntu.local ticky: ERROR The ticket was modified while updating (mcintosh)
Jan 31 17:29:11 ubuntu.local ticky: ERROR Connection to DB failed (oren)
Jan 31 17:51:52 ubuntu.local ticky: INFO Closed ticket [#8604] (mcintosh)
Jan 31 18:09:17 ubuntu.local ticky: ERROR The ticket was modified while updating (noel)
Jan 31 18:43:01 ubuntu.local ticky: ERROR Ticket doesn't exist (nonummy)
Jan 31 19:00:23 ubuntu.local ticky: ERROR Timeout while retrieving information (blossom)
Jan 31 19:20:22 ubuntu.local ticky: ERROR Timeout while retrieving information (mai.hendrix)
Jan 31 19:59:06 ubuntu.local ticky: INFO Created ticket [#6361] (enim.non)
Jan 31 20:02:41 ubuntu.local ticky: ERROR Timeout while retrieving information (xlg)
Jan 31 20:21:55 ubuntu.local ticky: INFO Commented on ticket [#7159] (ahmed.miller)
Jan 31 20:28:26 ubuntu.local ticky: ERROR Connection to DB failed (breee)
Jan 31 20:35:17 ubuntu.local ticky: INFO Created ticket [#7737] (nonummy)
Jan 31 20:48:02 ubuntu.local ticky: ERROR Connection to DB failed (mdouglas)
Jan 31 20:56:58 ubuntu.local ticky: INFO Closed ticket [#4372] (oren)
Jan 31 21:00:23 ubuntu.local ticky: INFO Commented on ticket [#2389] (sri)
Jan 31 21:02:06 ubuntu.local ticky: ERROR Connection to DB failed (breee)
Jan 31 21:20:33 ubuntu.local ticky: INFO Closed ticket [#3297] (kirknixon)
Jan 31 21:29:24 ubuntu.local ticky: ERROR The ticket was modified while updating (blossom)
Jan 31 22:58:55 ubuntu.local ticky: INFO Created ticket [#2461] (jackowens)
Jan 31 23:25:18 ubuntu.local ticky: INFO Closed ticket [#9876] (blossom)
Jan 31 23:35:40 ubuntu.local ticky: INFO Created ticket [#5896] (mcintosh)
ticky_check.py

Usage: ./ticky_check.py

#!/usr/bin/env python3
import sys
import re
import operator
import csv

# Dict: Count number of entries for each user
per_user = {}  # Splitting between INFO and ERROR
# Dict: Number of different error messages
errors = {}

# * Read file and create dictionaries
with open('syslog.log') as file:
    # read each line
    for line in file.readlines():
        # regex search
        # * Sample Line of log file
        # "May 27 11:45:40 ubuntu.local ticky: INFO: Created ticket [#1234] (username)"
        match = re.search(
            r"ticky: ([\w+]*):? ([\w' ]*)[\[[#0-9]*\]?]? ?\((.*)\)$", line)
        code, error_msg, user = match.group(1), match.group(2), match.group(3)

        # Populates error dict with ERROR messages from log file
        if error_msg not in errors.keys():
            errors[error_msg] = 1
        else:
            errors[error_msg] += 1
        # Populates per_user dict with users and default values
        if user not in per_user.keys():
            per_user[user] = {}
            per_user[user]['INFO'] = 0
            per_user[user]['ERROR'] = 0
        # Populates per_user dict with users logs entry
        if code == 'INFO':
            if user not in per_user.keys():
                per_user[user] = {}
                per_user[user]['INFO'] = 0
            else:
                per_user[user]["INFO"] += 1
        elif code == 'ERROR':
            if user not in per_user.keys():
                per_user[user] = {}
                per_user[user]['INFO'] = 0
            else:
                per_user[user]['ERROR'] += 1


# Sorted by VALUE (Most common to least common)
errors_list = sorted(errors.items(), key=operator.itemgetter(1), reverse=True)

# Sorted by USERNAME
per_user_list = sorted(per_user.items(), key=operator.itemgetter(0))

file.close()
# Insert at the beginning of the list
errors_list.insert(0, ('Error', 'Count'))
per_user_list.insert(0, ('Username', {'INFO': 'INFO', 'ERROR': 'ERROR'}))

# * Create CSV file user_statistics
with open('user_statistics.csv', 'w', newline='') as user_csv:
    for key, value in per_user_list:
        user_csv.write(str(key) + ',' +
                       str(value['INFO']) + ',' + str(value['ERROR'])+'\n')

# * Create CSV error_message
with open('error_message.csv', 'w', newline='') as error_csv:
    for key, value in errors_list:
        error_csv.write(str(key) + ',' + str(value) + '\n')
csv_to_html.py

Usage: ./csv_to_html.py user_statistics.csv /var/www/html/<html-filename>.html

#!/usr/bin/env python3


import sys
import csv
import os

def process_csv(csv_file):
    """Turn the contents of the CSV file into a list of lists"""
    print("Processing {}".format(csv_file))
    with open(csv_file,"r") as datafile:
        data = list(csv.reader(datafile))
    return data

def data_to_html(title, data):
    """Turns a list of lists into an HTML table"""

    # HTML Headers
    html_content = """
<html>
<head>
<style>
table {
    width: 25%;
    font-family: arial, sans-serif;
    border-collapse: collapse;
}

tr:nth-child(odd) {
    background-color: #dddddd;
}

td, th {
    border: 1px solid #dddddd;
    text-align: left;
    padding: 8px;
}
</style>
</head>
<body>
"""


    # Add the header part with the given title
    html_content += "<h2>{}</h2><table>".format(title)

    # Add each row in data as a row in the table
    # The first line is special and gets treated separately
    for i, row in enumerate(data):
        html_content += "<tr>"
        for column in row:
            if i == 0:
                html_content += "<th>{}</th>".format(column)
            else:
                html_content += "<td>{}</td>".format(column)
        html_content += "</tr>"

    html_content += """</tr></table></body></html>"""
    return html_content


def write_html_file(html_string, html_file):

    # Making a note of whether the html file we're writing exists or not
    if os.path.exists(html_file):
        print("{} already exists. Overwriting...".format(html_file))

    with open(html_file,'w') as htmlfile:
        htmlfile.write(html_string)
    print("Table succesfully written to {}".format(html_file))

def main():
    """Verifies the arguments and then calls the processing function"""
    # Check that command-line arguments are included
    if len(sys.argv) < 3:
        print("ERROR: Missing command-line argument!")
        print("Exiting program...")
        sys.exit(1)

    # Open the files
    csv_file = sys.argv[1]
    html_file = sys.argv[2]

    # Check that file extensions are included
    if ".csv" not in csv_file:
        print('Missing ".csv" file extension from first command-line argument!')
        print("Exiting program...")
        sys.exit(1)

    if ".html" not in html_file:
        print('Missing ".html" file extension from second command-line argument!')
        print("Exiting program...")
        sys.exit(1)

    # Check that the csv file exists
    if not os.path.exists(csv_file):
        print("{} does not exist".format(csv_file))
        print("Exiting program...")
        sys.exit(1)

    # Process the data and turn it into an HTML
    data = process_csv(csv_file)
    title = os.path.splitext(os.path.basename(csv_file))[0].replace("_", " ").title()
    html_string = data_to_html(title, data)
    write_html_file(html_string, html_file)

if __name__ == "__main__":
    main()

Google Python Course

Course 2

Understanding Slowness

Slow Web Server

ab - Apache benchmark tool

ab -n 500 site.example.com

Profiling - Improving the code

Profiling 可透過監控和分析即時資源使用情況,協助軟體工程師設計高效率且有效的應用程式。對 IT 專業人員而言,Profile 的能力是非常寶貴的工具。雖然 Profiling 並非新技術,但類似技術在今日仍然適用,而且 Profiling 可改善反應速度並最佳化資源使用,為軟體開發奠定穩固的基礎

A profiler is a tool that measures the resources that our code is using, giving us a better understanding of what's going on.

Parallelizing operations

Python modules

Concurrency for I/O-bound tasks

Python has two main approaches to implementing concurrency: threading and asyncio.

  1. Threading is an efficient method for overlapping waiting times. This makes it well-suited for tasks involving many I/O operations, such as file I/O or network operations that spend significant time waiting. There are however some limitations with threading in Python due to the Global Interpreter Lock (GIL), which can limit the utilization of multiple cores.

  2. Alternatively, asyncio is another powerful Python approach for concurrency that uses the event loop to manage task switching. Asyncio provides a higher degree of control, scalability, and power than threading for I/O-bound tasks. Any application that involves reading and writing data can benefit from it, since it speeds up I/O-based programs. Additionally, asyncio operates cooperatively and bypasses GIL limitations, enabling better performance for I/O-bound tasks.

Python supports concurrent execution through both threading and asyncio; however, asyncio is particularly beneficial for I/O-bound tasks, making it significantly faster for applications that read and write a lot of data.

Parallelism for CPU-bound tasks

Parallelism is a powerful technique for programs that heavily rely on the CPU to process large volumes of data constantly. It's especially useful for CPU-bound tasks like calculations, simulations, and data processing.

Instead of interleaving and executing tasks concurrently, parallelism enables multiple tasks to run simultaneously on multiple CPU cores. This is crucial for applications that require significant CPU resources to handle intense computations in real-time.

Multiprocessing libraries in Python facilitate parallel execution by distributing tasks across multiple CPU cores. It ensures performance by giving each process its own Python interpreter and memory space. It allows CPU-bound Python programs  to process data more efficiently by giving each process its own Python interpreter and memory space; this eliminates conflicts and slowdowns caused by sharing resources. Having said that, you should also remember that when running multiple tasks simultaneously, you need to manage resources carefully.

Combining concurrency and parallelism

Combining concurrency and parallelism can improve performance. In certain complex applications with both I/O-bound and CPU-bound tasks, you can use asyncio for concurrency and multiprocessing for parallelism.

With asyncio, you make I/O-bound tasks more efficient as the program can do other things while waiting for file operations.

On the other hand, multiprocessing allows you to distribute CPU-bound computations, like heavy calculations, across multiple processors for faster execution.

By combining these techniques, you can create a well-optimized and responsive program. Your I/O-bound tasks benefit from concurrency, while CPU-bound tasks leverage parallelism.

psutil

# Installation
pip3 install psutil

Usage

import psutil

# for checking CPU usage
psutil.cpu_percent()

# For checking disk I/O, 
psutil.disk_io_counters()

# For checking the network I/O bandwidth:
psutil.net_io_counters()
rsync with python

Use the rsync command in Python

import subprocess
src = "<source-path>" # replace <source-path> with the source directory
dest = "<destination-path>" # replace <destination-path> with the destination directory

subprocess.call(["rsync", "-arq", src, dest])

Segmentation fault

記憶體區段錯誤 - 這通常發生在低階語言開發的程式,例如 C, C++。這類的程式開發會需要對記憶體進行配置,當程式嘗試存取無效的記憶體位址時,程式就會當掉結束,並出現這種錯誤。

gdb
ulimit -c unlimited
gdb -c core example

gdb sub-commands

gdb -c core example
....
(gdb) backtrace
....
(gdb) up
...
list
...
print i
...
print argv[0]
...
print argv[1]

Python Cheat Sheet

String Methods

python_string_method.jpg

Set/List/Dictionary Methods

python_list.jpg

List methods

python_list_2.jpg

List methods

python_list_methods.jpg

Data Structures

python_data_structure.jpg

 

Set 集合

當您想要儲存一堆元素,並確定這些元素只出現一次時,就會使用集合(set)。集合(set)的元素也必須是不可變的。您可以將其視為字典 (dictionary) 中沒有關聯值 (value) 的鍵 (key)

A = {"jlanksy", "drosas", "nmason"}

# Create an empty set
B = set()

# set 不會有重複的元素
basket = {'apple', 'orange', 'apple', 'pear', 'orange', 'banana'}
print(basket)                      # show that duplicates have been removed
# Output: {'orange', 'banana', 'pear', 'apple'}

Methods

.add()

.add() 新增元素

s = {1, 2, 3, 4, 5}
s.add(6)
s.add(7)
s.add(7)

print(s)
# Output {1, 2, 3, 4, 5, 6, 7}
.remove()

.remove() 刪除元素

s = {1, 2, 3, 4, 5}
s.remove(5)
#s.remove(6) # Error

print(s)
# Output {1, 2, 3, 4}

範例

元素 in set
fruits = {'apple','banana','orange','lemon'}
print('tomato' in fruits)    # Output False
result = 'apple' in fruits
print(result)                # Output True
Set 交集
fruits1 = {'apple','banana','orange','lemon'}
fruits2 = {'tomato','apple','banana'}
print(fruits1 & fruits2)   # Output {'apple', 'banana'}
print(fruits2 & fruits1)   # Output {'apple', 'banana'}
nums1 = {1,2,3,4,5}
nums2 = {2,4,6,8,10}
print(nums1.intersection(nums2))  # Output {2, 4}
print(nums2.intersection(nums1))  # Output {2, 4}
Set 聯集
fruits1 = {'apple','banana','orange','lemon'}
fruits2 = {'tomato','apple','banana'}
print(fruits1 | fruits2)  # Output {'orange', 'banana', 'tomato', 'lemon', 'apple'}
print(fruits2 | fruits1)  # Output {'orange', 'banana', 'tomato', 'lemon', 'apple'}
nums1 = {1,2,3,4,5}
nums2 = {2,4,6,8,10}
print(nums1.union(nums2))  # Output {1, 2, 3, 4, 5, 6, 8, 10}
print(nums2.union(nums1))  # Output {1, 2, 3, 4, 5, 6, 8, 10}
Set 差集
fruits1 = {'apple','banana','orange','lemon'}
fruits2 = {'orange','lemon','tomato'}
print(fruits1 - fruits2)  # Output {'apple', 'banana'}
print(fruits2 - fruits1)  # Output {'tomato'}
nums1 = {1,2,3,4,5}
nums2 = {4,5,6,7,8}
print(nums1.difference(nums2))  # Output {1, 2, 3}
print(nums2.difference(nums1))  # Output {8, 6, 7}
Set 對稱差集
fruits1 = {'apple','banana','orange','lemon'}
fruits2 = {'orange','lemon','tomato'}
print(fruits1 ^ fruits2)  # Output {'tomato', 'banana', 'apple'}
print(fruits2 ^ fruits1)  # Output {'tomato', 'banana', 'apple'}
nums1 = {1,2,3,4,5}
nums2 = {4,5,6,7,8}
print(nums1.symmetric_difference(nums2)) # Output {1, 2, 3, 6, 7, 8}
print(nums2.symmetric_difference(nums1)) # Output {1, 2, 3, 6, 7, 8}

CSV

Reading CSV files

csv_file.txt

Sabrina Green,802-867-5309,System Administrator
Eli Jones,684-3481127,IT specialist
Melody Daniels,846-687-7436,Programmer
Charlie Rivera,698-746-3357,Web Developer
import csv
 f = open("csv_file.txt")
 csv_f = csv.reader(f)
 for row in csv_f:
     name, phone, role = row
     print("Name: {}, Phone: {}, Role: {}".format(name, phone, role))
f.close()

Output:

Name: Sabrina Green, Phone: 802-867-5309, Role: System Administrator
Name: Eli Jones, Phone: 684-3481127, Role: IT specialist
Name: Melody Daniels, Phone: 846-687-7436, Role: Programmer
Name: Charlie Rivera, Phone: 698-746-3357, Role: Web Developer

Generating CSV

import csv

hosts = [["workstation.local", "192.168.25.46"],["webserver.cloud", "10.2.5.6"]]
with open('hosts.csv', 'w') as hosts_csv:
    writer = csv.writer(hosts_csv)
    writer.writerows(hosts)

With list

Reading a CSV with the list

user_emails.csv

Full Name, Email Address
Blossom Gill, blossom@xyz.edu
Hayes Delgado, nonummy@utnisia.com
Petra Jones, ac@xyz.edu
Oleg Noel, noel@liberomauris.ca
Ahmed Miller, ahmed.miller@nequenonquam.co.uk
Macaulay Douglas, mdouglas@xyz.edu
Aurora Grant, enim.non@xyz.edu
user_email_list = []

with open(csv_file_location, 'r') as f:
    user_data_list = list(csv.reader(f))
    user_email_list = [data[1].strip() for data in user_data_list[1:]]

With dictionary

Reading a CSV with the dictionary

# software.csv
# name,version,status,users
# MailTree,5.34,production,324
# CalDoor,1.25.1,beta,22
# Chatty Chicken,0.34,alpha,4

with open('software.csv') as software:
    reader = csv.DictReader(software)
    for row in reader:
        print(("{} has {} users").format(row["name"], row["users"]))

# Output:
# MailTree has 324 users
# CalDoor has 22 users
# Chatty Chicken has 4 users

Writing a CSV with the dictionary

users = [ {"name": "Sol Mansi", "username": "solm", "department": "IT infrastructure"}, 
 {"name": "Lio Nelson", "username": "lion", "department": "User Experience Research"}, 
  {"name": "Charlie Grey", "username": "greyc", "department": "Development"}]
keys = ["name", "username", "department"]
with open('by_department.csv', 'w') as by_department:
    writer = csv.DictWriter(by_department, fieldnames=keys)
    writer.writeheader()
    writer.writerows(users)

# by_department.csv:
# Name,username,department
# Sol Mansi,solm, IT infrastructure
# Lio Nelson,lion,User Experience Researcher
# Charlie Grey,greyc,Development

Errors and Exceptions

適用實例:

Try-Except

def main():
    if len(sys.argv) < 2:
        return usage()

    try:
        date, title, emails = sys.argv[1].split('|')
        message = message_template(date, title)
        send_message(message, emails)
        print("Successfully sent reminders to:", emails)
    except Exception as e:
        print("Failure to send email", file=sys.stderr)
    except Exception as e:
       print("Failure to send email: {}".format(e), file=sys.stderr)
def character_frequency(filename):
  """Counts the frequency of each character in the given file."""
  # First try to open the file
  try:
    f = open(filename)
  except OSError:
    return None

  # Now process the file
  characters = {}
  for line in f:
    for char in line:
      characters[char] = characters.get(char, 0) + 1
  f.close() 
  return characters
def calculate_average(numbers):
    try:
        return sum(numbers) / len(numbers)
    except TypeError:
        raise InvalidInputError(f"Expected a list or tuple, but got {type(numbers)}")
    except ZeroDivisionError:
        raise EmptyInputError("The list is empty. Cannot calculate the average.")
    finally:
        print("Execution of calculate_average function completed.")

 

Raise

def validate_user(username, minlen):
  assert type(username) == str, "username must be a string"
  if minlen < 1:
    raise ValueError("minlen must be at least 1")

  if len(username) < minlen:
    return False
  if not username.isalnum():
    return False
  return True

For unit test

import unittest

from validations import validate_user

class TestValidateUser(unittest.TestCase):
  def test_valid(self):
    self.assertEqual(validate_user("validuser", 3), True)

  def test_too_short(self):
    self.assertEqual(validate_user("inv", 5), False)

  def test_invalid_characters(self):
    self.assertEqual(validate_user("invalid_user", 1), False)
    
  def test_invalid_minlen(self):
    self.assertRaises(ValueError, validate_user, "user", -1)


# Run the tests
unittest.main()
def enhanced_read_and_divide(filename):
	try:
		with open(filename, 'r') as file:
			data = file.readlines()
       	 
        # Ensure there are at least two lines in the file
        if len(data) < 2:
            raise ValueError("Not enough data in the file.")
       	 
        num1 = int(data[0])
        num2 = int(data[1])
       	 
        # Check if second number is zero
        if num2 == 0:
            raise ZeroDivisionError("The denominator is zero.")
       	 
        return num1 / num2


	except FileNotFoundError:
    	     return "Error: The file was not found."
	except ValueError as ve:
    	     return f"Value error: {ve}"
	except ZeroDivisionError as zde:
    	     return f"Division error: {zde}"

Examples

User's emails

user_emails.csv :

Blossom Gill,blossom@abc.edu
Hayes Delgado,nonummy@abc.edu
Petra Jones,ac@abc.edu
Oleg Noel,noel@abc.edu
Ahmed Miller,ahmed.miller@abc.edu
Macaulay Douglas,mdouglas@abc.edu
Aurora Grant,enim.non@abc.edu
Madison Mcintosh,mcintosh@abc.edu
Montana Powell,montanap@abc.edu
Rogan Robinson,rr.robinson@abc.edu
Simon Rivera,sri@abc.edu
Benedict Pacheco,bpacheco@abc.edu
Maisie Hendrix,mai.hendrix@abc.edu
Xaviera Gould,xlg@abc.edu
Oren Rollins,oren@abc.edu
Flavia Santiago,flavia@abc.edu
Jackson Owens,jacksonowens@abc.edu
Britanni Humphrey,britanni@abc.edu
Kirk Nixon,kirknixon@abc.edu
Bree Campbell,breee@abc.edu

emails.py : Main program

#!/usr/bin/env python3

import sys
import csv

def populate_dictionary(filename): 
  """Populate a dictionary with name/email pairs for easy lookup."""
  email_dict = {}
  with open(filename) as csvfile:
    lines = csv.reader(csvfile, delimiter = ',')
    for row in lines:
      name = str(row[0].lower())
      email_dict[name] = row[1]
  return email_dict

def find_email(argv):
  """ Return an email address based on the username given."""
  # Create the username based on the command line input.
  try:
    fullname = str(argv[1] + " " + argv[2])
    # Preprocess the data
    email_dict = populate_dictionary('/home/student/data/user_emails.csv')
    # Find and print the email
    if email_dict.get(fullname.lower()):
      return email_dict.get(fullname.lower())
    else:
      return "No email address found"
  except IndexError:
    return "Missing parameters"

def main():
  print(find_email(sys.argv))

if __name__ == "__main__":
  main()

emails_test.py : For unit test

#!/usr/bin/env python3
import unittest
from emails import find_email

class EmailsTest(unittest.TestCase):
  def test_basic(self):
    testcase = [None, "Bree", "Campbell"]
    expected = "breee@abc.edu"
    self.assertEqual(find_email(testcase), expected)

  def test_one_name(self):
    testcase = [None, "John"]
    expected = "Missing parameters"
    self.assertEqual(find_email(testcase), expected)

  def test_two_name(self):
    testcase = [None, "Roy", "Cooper"]
    expected = "No email address found"
    self.assertEqual(find_email(testcase), expected)

if __name__ == '__main__':
  unittest.main()


Binary Search

二分搜尋(Binary Search)是一種高效的搜尋演算法,用於在已排序的串列(List)中尋找特定元素的位置或值。

前提條件:

資料集合必須是已排序的,可以是升序或降序排列。這是因為二分搜尋利用了排序順序來有效地縮小搜索範圍。

步驟:

  1. 初始化左右邊界:將搜尋範圍的左邊界 left 設為 0,右邊界 right 設為資料集合的最後一個元素的索引。
  2. 重複以下步驟,直到左邊界 left 大於右邊界 right:
    • 計算中間索引 mid,可以使用 mid = (left + right) // 2。
    • 檢查中間元素 arr[mid] 與目標元素 target 的比較:
      • 如果 arr[mid] 等於 target,則找到目標元素,返回 mid。
      • 如果 arr[mid] 大於 target,則將右邊界 right 設為 mid - 1,縮小搜索範圍為左半部分。
      • 如果 arr[mid] 小於 target,則將左邊界 left 設為 mid + 1,縮小搜索範圍為右半部分。
  3. 如果搜索範圍內找不到目標元素,則返回 -1,表示目標元素不存在於數列中。

特點:

二分搜尋是一個高效的搜尋演算法,特別適用於已排序的數列中尋找目標元素。它的主要優勢在於其快速的搜索速度,特別在大型資料集合中表現出色。

def linear_search(list, key):
    """If key is in the list returns its position in the list,
       otherwise returns -1."""
    for i, item in enumerate(list):
        if item == key:
            return i
    return -1
def binary_search(list, key):
    """Returns the position of key in the list if found, -1 otherwise.

    List must be sorted.
    """

    # Sort the List
    list.sort()                       # 排序串列
    left, right = 0, len(list) - 1    # 初始化左右邊界
    
    while left <= right:
        middle = (left + right) // 2  # 計算中間索引
        
        if list[middle] == key:
            return middle             # 找到目標元素,傳回索引位置
        if list[middle] > key:
            right = middle - 1        # 調整右邊界值,縮小搜索範圍為左半部分
        if list[middle] < key:
            left = middle + 1         # 調整左邊界,縮小搜索範圍為右半部分
    return -1                         # 目標元素不存在於數列中,返回-1


# 測試
my_list = [2, 4, 7, 12, 15, 21, 30, 34, 42]
target_number = 15

result = binary_search(my_list, target_number)

if result != -1:
    print(f"目標數字 {target_number} 存在於數列中,索引位置為 {result}")
else:
    print(f"目標數字 {target_number} 不存在於數列中")
Example2: Binary Search
def find_item(list, item):
  #Returns True if the item is in the list, False if not.
  if len(list) == 0:
    return False

  list.sort()
  #Is the item in the center of the list?
  middle = len(list)//2
  if list[middle] == item:
    return True

  #Is the item in the first half of the list? 
  if item < list[middle]:
    #Call the function with the first half of the list
    return find_item(list[:middle], item)
  else:
    #Call the function with the second half of the list
    return find_item(list[middle+1:], item)

  return False

list_of_names = ["Parker", "Drew", "Cameron", "Logan", "Alex", "Chris", "Terry", "Jamie", "Jordan", "Taylor"]

print(find_item(list_of_names, "Alex")) # True
print(find_item(list_of_names, "Andrew")) # False
print(find_item(list_of_names, "Drew")) # True
print(find_item(list_of_names, "Jared")) # False

使用案例
  1. 查找元素: 最常見的用途是在已排序的數列或列表中查找特定的元素。因為數據已經排序,所以你可以迅速縮小搜索範圍,從而實現快速查找。
  2. 字典或詞彙搜尋: 在字典或詞彙中查找單詞或詞彙時,可以使用二分搜尋,特別是當詞彙是按字母順序排列時。
  3. 庫存管理系統: 在庫存管理系統中,你可以使用二分搜尋來查找特定產品或物品的庫存信息。庫存項目通常按照產品編號或名稱排序。
  4. 數學方程求解: 在數學應用中,你可以使用二分搜尋來解方程或找到方程的根。通過不斷縮小可能的解的範圍,可以高效地找到解。
  5. 遊戲開發: 在遊戲中,你可以使用二分搜尋來實現各種功能,如查找玩家在排行榜中的位置、確定物體是否在特定範圍內等。
  6. 日曆應用: 在日曆應用中,你可以使用二分搜尋來查找特定日期,尤其是當日期已按日期順序排列時。
  7. 簡單排序: 雖然二分搜尋主要是一個搜尋演算法,但也可以在排序中使用。你可以使用二分搜尋來找到應該插入的位置,以實現插入排序。
  8. 音樂播放器: 在音樂播放器中,你可以使用二分搜尋來查找特定歌曲或歌手,特別是當音樂庫已按標題或藝術家名稱排序時。
  9. 路線規劃: 在地圖或路線規劃應用中,你可以使用二分搜尋來查找最接近的地點或路徑,以提高搜索速度。
Linear vs. Binary Search
def linear_search(list, key):
    #Returns the number of steps to determine if key is in the list 

    #Initialize the counter of steps
    steps=0
    for i, item in enumerate(list):
        steps += 1
        if item == key:
            break
    return steps 

def binary_search(list, key):
    #Returns the number of steps to determine if key is in the list 

    #List must be sorted:
    list.sort()

    #The Sort was 1 step, so initialize the counter of steps to 1
    steps=1

    left = 0
    right = len(list) - 1
    while left <= right:
        steps += 1
        middle = (left + right) // 2
        
        if list[middle] == key:
            break
        if list[middle] > key:
            right = middle - 1
        if list[middle] < key:
            left = middle + 1
    return steps 

def best_search(list, key):
    steps_linear = linear_search(list, key) 
    steps_binary = binary_search(list, key) 
    results = "Linear: " + str(steps_linear) + " steps, "
    results += "Binary: " + str(steps_binary) + " steps. "
    if (steps_linear < steps_binary):
        results += "Best Search is Linear."
    elif (steps_linear > steps_binary):
        results += "Best Search is Binary."
    else:
        results += "Result is a Tie."

    return results

print(best_search([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 1))
#Should be: Linear: 1 steps, Binary: 4 steps. Best Search is Linear.

print(best_search([10, 2, 9, 1, 7, 5, 3, 4, 6, 8], 1))
#Should be: Linear: 4 steps, Binary: 4 steps. Result is a Tie.

print(best_search([10, 9, 8, 7, 6, 5, 4, 3, 2, 1], 7))
#Should be: Linear: 4 steps, Binary: 5 steps. Best Search is Linear.

print(best_search([1, 3, 5, 7, 9, 10, 2, 4, 6, 8], 10))
#Should be: Linear: 6 steps, Binary: 5 steps. Best Search is Binary.

print(best_search([5, 1, 8, 2, 4, 10, 7, 6, 3, 9], 11))
#Should be: Linear: 10 steps, Binary: 5 steps. Best Search is Binary.

 

Debug

Debugging

assert
x = 5
assert x == 5, "x should be 5"

assert type(username) == str, "username must be a string"
prinf debugging
print("Processing {}".format(basename))
strace
# Installation on RHEL if it's not installed
yum install strace

# Tracing system calls made by a program
strace ./my-program.py
strace -o my-program.strace ./my-program

Crash

pdb

功能:

pdb3 myprog.py

pdb-subcommands

(Pdb) continue
...
(Pdb) print(row)

Step 1: Set a breakpoint

import pdb


def add_numbers(a, b):
    pdb.set_trace()  # This will set a breakpoint in the code
    result = a + b
    return result


print(add_numbers(3, 4))

Setp 2: Enter the interactive debugger

Step 3: Inspect variables

To inspect the variables, simply type the single character, p, then the variable name to see its current value. For instance, if you have a variable in your code named sentiment_score, just type p sentiment_score at the pdb prompt to inspect its value.

Step 4: Modify variables

A big advantage of pdb is that you can change the value of a variable directly in the debugger. For example, to change sentiment_score to 0.9, you'd type !sentiment_score = 0.9.

To confirm these changes, use a or directly probe the value with p <value name>.

 Step 5: Exit the debugger

When you’re done, simply enter q (quit) to exit the debugger and terminate the program.

Post-mortem debugging

python -m pdb your_script.py

Memory Leaks

當不再需要的記憶體未釋放時,就會發生記憶體洩漏。即使重新啟動,仍需要大量記憶體的應用程式,很可能指向記憶體洩漏

memory_profiler

第一欄顯示每一行執行時所需的記憶體數量。第二欄顯示每一行所增加的記憶體

python3 -m memory_profiler myprog.py

In Code

from memory_profiler import profile

...
...

@profile
def main():
  ...
  ...