Linux bash shell script batch download files All In One
Linux bash shell script batch download files All In One
Linux bash shell script 批量下载文件 All In One
solution
pdf crawler / pdf 爬虫
#!/bin/bash
# 下载目录
downdir="/Users/xgqfrms-mbp/Documents/swift-ui/Memorize/000-xyz/pdfs/"
# $1 是传递给 shell 的第一个参数
# read line 按行读取文件
cat $1 | while read line
do
# shell 变量需要使用双引号包裹, 或 echo $line
echo "$line"
cd $downdir
str=$line
# 按行分割,每行一个, 正则表达式:字符串转数组
array=(${str//;/ })
echo "$array"
url=${array[0]}
# tr 删除换行字符 ✅
filename=$(echo ${array[1]} | tr -d '\r')
# filename=$(echo "l" + ${index} + ".pdf" | tr -d '\r')
# filename=$(echo "l${index}.pdf" | tr -d '\r')
# cURL 执行下载, -o 输出文件
curl $url -o $filename
done
# exit 0
# mkdir pdfs
$ bash ./auto-download-pdfs.sh cs193p.txt
# OR, 可执行脚本
$ chmod +x ./auto-download-pdfs.sh
$ ./auto-download-pdfs.sh cs193p.txt
cs193p.txt
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l1.pdf;l1.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l2.pdf;l2.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l3.pdf;l3.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l4.pdf;l4.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l5.pdf;l5.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l6.pdf;l6.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l7.pdf;l7.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l8.pdf;l8.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l9.pdf;l9.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l10.pdf;l10.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l11.pdf;l11.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l12.pdf;l12.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l13.pdf;l12.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l14.pdf;l14.pdf
demos
CS193p PDFs, 2020 Spring L1 ~ L14
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l1.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l1.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l2.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l3.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l4.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l5.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l6.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l7.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l8.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l9.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l10.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l11.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l12.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l13.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l14.pdf
Linux tr
command
tr 转义或删除字符
$ man tr > man-tr.md
$ cat man-tr.md
TR(1) User Commands TR(1)
NAME
tr - translate or delete characters
SYNOPSIS
tr [OPTION]... SET1 [SET2]
DESCRIPTION
Translate, squeeze, and/or delete characters from standard input, writing to standard output.
-c, -C, --complement
use the complement of SET1
-d, --delete
delete characters in SET1, do not translate
-s, --squeeze-repeats
replace each sequence of a repeated character that is listed in the last specified SET, with a
single occurrence of that character
-t, --truncate-set1
first truncate SET1 to length of SET2
--help display this help and exit
--version
output version information and exit
SETs are specified as strings of characters. Most represent themselves. Interpreted sequences are:
\NNN character with octal value NNN (1 to 3 octal digits)
\\ backslash
\a audible BEL
\b backspace
\f form feed
\n new line
\r return
\t horizontal tab
\v vertical tab
CHAR1-CHAR2
all characters from CHAR1 to CHAR2 in ascending order
[CHAR*]
in SET2, copies of CHAR until length of SET1
[CHAR*REPEAT]
REPEAT copies of CHAR, REPEAT octal if starting with 0
[:alnum:]
all letters and digits
[:alpha:]
all letters
[:blank:]
all horizontal whitespace
[:cntrl:]
all control characters
[:digit:]
all digits
[:graph:]
all printable characters, not including space
[:lower:]
all lower case letters
[:print:]
all printable characters, including space
[:punct:]
all punctuation characters
[:space:]
all horizontal or vertical whitespace
[:upper:]
all upper case letters
[:xdigit:]
all hexadecimal digits
[=CHAR=]
all characters which are equivalent to CHAR
Translation occurs if -d is not given and both SET1 and SET2 appear. -t may be used only when translat‐
ing. SET2 is extended to length of SET1 by repeating its last character as necessary. Excess characters
of SET2 are ignored. Only [:lower:] and [:upper:] are guaranteed to expand in ascending order; used in
SET2 while translating, they may only be used in pairs to specify case conversion. -s uses the last
specified SET, and occurs after translation or deletion.
AUTHOR
Written by Jim Meyering.
REPORTING BUGS
GNU coreutils online help: <https://www.gnu.org/software/coreutils/>
Report any translation bugs to <https://translationproject.org/team/>
COPYRIGHT
Copyright © 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later
<https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent
permitted by law.
SEE ALSO
Full documentation <https://www.gnu.org/software/coreutils/tr>
or available locally via: info '(coreutils) tr invocation'
GNU coreutils 8.32 September 2020 TR(1)
pi@raspberrypi:~/Desktop/man-docs $
emmet
使用 vscode emmet 语法动态生成
$index.pdf
p{https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l$.pdf}*14
https://code.visualstudio.com/docs/editor/emmet
https://github.com/emmetio/emmet
TODO
Node.js version
const fs = require("fs");
var path = require("path");
const { exit } = require("process");
const log = console.log;
const request = require("request");
// const request = require("request-promise-native");
var folder = path.resolve(__dirname, '../pdf');
// log('folder', folder);
if (!fs.existsSync(folder)) {
fs.mkdirSync(folder);
}
async function downloadPDF(url, filename) {
log('🚧 pdf downloading ...');
const pdfBuffer = await request.get({
uri: url,
encoding: null,
// encoding: 'utf-8',
});
fs.writeFileSync(filename, pdfBuffer);
log('✅ pdf finished!');
// exit 0;
}
const url = 'https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l1.pdf';
const filename = folder + '/cs193p-2021-l1.pdf';
// log('filename =', filename);
downloadPDF(url, filename);
https://www.cnblogs.com/xgqfrms/p/16086580.html
npm package
$ npm i -g auto-download-files
https://www.npmjs.com/package/auto-download-files
Python version
//
refs
TypeScript & Node.js crawler All In One
https://www.cnblogs.com/xgqfrms/p/16086580.html
©xgqfrms 2012-2020
www.cnblogs.com/xgqfrms 发布文章使用:只允许注册用户才可以访问!
原创文章,版权所有©️xgqfrms, 禁止转载 🈲️,侵权必究⚠️!
本文首发于博客园,作者:xgqfrms,原文链接:https://www.cnblogs.com/xgqfrms/p/16073509.html
未经授权禁止转载,违者必究!