xgqfrms™, xgqfrms® : xgqfrms's offical website of cnblogs! xgqfrms™, xgqfrms® : xgqfrms's offical website of GitHub!

Linux bash shell script batch download files All In One

Linux bash shell script batch download files All In One

Linux bash shell script 批量下载文件 All In One

solution

pdf crawler / pdf 爬虫

#!/bin/bash

# 下载目录
downdir="/Users/xgqfrms-mbp/Documents/swift-ui/Memorize/000-xyz/pdfs/"

# $1 是传递给 shell 的第一个参数
# read line 按行读取文件
cat $1 | while read line
do
  # shell 变量需要使用双引号包裹, 或 echo $line
  echo "$line"
  cd $downdir
  str=$line
  # 按行分割,每行一个, 正则表达式:字符串转数组
  array=(${str//;/ })
  echo "$array"
  url=${array[0]}
  # tr 删除换行字符 ✅
  filename=$(echo ${array[1]} | tr -d '\r')
  # filename=$(echo "l" + ${index} + ".pdf" | tr -d '\r')
  # filename=$(echo "l${index}.pdf" | tr -d '\r')
  # cURL 执行下载, -o 输出文件
  curl $url -o $filename
done

# exit 0

# mkdir pdfs
$ bash ./auto-download-pdfs.sh cs193p.txt
# OR, 可执行脚本
$ chmod +x ./auto-download-pdfs.sh
$ ./auto-download-pdfs.sh cs193p.txt

cs193p.txt

https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l1.pdf;l1.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l2.pdf;l2.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l3.pdf;l3.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l4.pdf;l4.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l5.pdf;l5.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l6.pdf;l6.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l7.pdf;l7.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l8.pdf;l8.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l9.pdf;l9.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l10.pdf;l10.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l11.pdf;l11.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l12.pdf;l12.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l13.pdf;l12.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l14.pdf;l14.pdf

demos

CS193p PDFs, 2020 Spring L1 ~ L14

https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l1.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l1.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l2.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l3.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l4.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l5.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l6.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l7.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l8.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l9.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l10.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l11.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l12.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l13.pdf
https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l14.pdf

Linux tr command

tr 转义或删除字符

$ man tr > man-tr.md

$ cat man-tr.md
TR(1)                                             User Commands                                            TR(1)

NAME
       tr - translate or delete characters

SYNOPSIS
       tr [OPTION]... SET1 [SET2]

DESCRIPTION
       Translate, squeeze, and/or delete characters from standard input, writing to standard output.

       -c, -C, --complement
              use the complement of SET1

       -d, --delete
              delete characters in SET1, do not translate

       -s, --squeeze-repeats
              replace  each  sequence  of  a repeated character that is listed in the last specified SET, with a
              single occurrence of that character

       -t, --truncate-set1
              first truncate SET1 to length of SET2

       --help display this help and exit

       --version
              output version information and exit

       SETs are specified as strings of characters.  Most represent themselves.  Interpreted sequences are:

       \NNN   character with octal value NNN (1 to 3 octal digits)

       \\     backslash

       \a     audible BEL

       \b     backspace

       \f     form feed

       \n     new line

       \r     return

       \t     horizontal tab

       \v     vertical tab

       CHAR1-CHAR2
              all characters from CHAR1 to CHAR2 in ascending order

       [CHAR*]
              in SET2, copies of CHAR until length of SET1

       [CHAR*REPEAT]
              REPEAT copies of CHAR, REPEAT octal if starting with 0

       [:alnum:]
              all letters and digits

       [:alpha:]
              all letters

       [:blank:]
              all horizontal whitespace

       [:cntrl:]
              all control characters

       [:digit:]
              all digits

       [:graph:]
              all printable characters, not including space

       [:lower:]
              all lower case letters

       [:print:]
              all printable characters, including space

       [:punct:]
              all punctuation characters

       [:space:]
              all horizontal or vertical whitespace

       [:upper:]
              all upper case letters

       [:xdigit:]
              all hexadecimal digits

       [=CHAR=]
              all characters which are equivalent to CHAR

       Translation occurs if -d is not given and both SET1 and SET2 appear.  -t may be used only when  translat‐
       ing.  SET2 is extended to length of SET1 by repeating its last character as necessary.  Excess characters
       of SET2 are ignored.  Only [:lower:] and [:upper:] are guaranteed to expand in ascending order;  used  in
       SET2  while  translating,  they  may  only be used in pairs to specify case conversion.  -s uses the last
       specified SET, and occurs after translation or deletion.

AUTHOR
       Written by Jim Meyering.

REPORTING BUGS
       GNU coreutils online help: <https://www.gnu.org/software/coreutils/>
       Report any translation bugs to <https://translationproject.org/team/>

COPYRIGHT
       Copyright  ©  2020  Free  Software  Foundation,  Inc.   License  GPLv3+:  GNU  GPL  version  3  or  later
       <https://gnu.org/licenses/gpl.html>.
       This  is  free software: you are free to change and redistribute it.  There is NO WARRANTY, to the extent
       permitted by law.

SEE ALSO
       Full documentation <https://www.gnu.org/software/coreutils/tr>
       or available locally via: info '(coreutils) tr invocation'

GNU coreutils 8.32                               September 2020                                            TR(1)
pi@raspberrypi:~/Desktop/man-docs $ 

emmet

使用 vscode emmet 语法动态生成 $index.pdf

p{https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l$.pdf}*14

https://code.visualstudio.com/docs/editor/emmet

https://emmet.io/

https://github.com/emmetio/emmet

TODO

Node.js version


const fs = require("fs");
var path = require("path");
const { exit } = require("process");
const log = console.log;
const request = require("request");
// const request = require("request-promise-native");

var folder = path.resolve(__dirname, '../pdf');

// log('folder', folder);

if (!fs.existsSync(folder)) {
  fs.mkdirSync(folder);
}


async function downloadPDF(url, filename) {
  log('🚧 pdf downloading ...');
  const pdfBuffer = await request.get({
    uri: url,
    encoding: null,
    // encoding: 'utf-8',
  });
  fs.writeFileSync(filename, pdfBuffer);
  log('✅ pdf finished!');
  // exit 0;
}

const url = 'https://cs193p.sites.stanford.edu/sites/g/files/sbiybj16636/files/media/file/l1.pdf';
const filename = folder + '/cs193p-2021-l1.pdf';


// log('filename =', filename);

downloadPDF(url, filename);



https://www.cnblogs.com/xgqfrms/p/16086580.html

npm package

$ npm i -g auto-download-files

https://www.npmjs.com/package/auto-download-files

Python version

// 

refs

TypeScript & Node.js crawler All In One

https://www.cnblogs.com/xgqfrms/p/16086580.html



©xgqfrms 2012-2020

www.cnblogs.com/xgqfrms 发布文章使用:只允许注册用户才可以访问!

原创文章,版权所有©️xgqfrms, 禁止转载 🈲️,侵权必究⚠️!


posted @ 2022-03-29 19:44  xgqfrms  阅读(89)  评论(6编辑  收藏  举报