中兴u930一键root(看完这篇你一定能掌握Linux非广告)

导读中兴u930一键root文章列表:1、看完这篇你一定能掌握Linux非广告2、82天突破1000star，项目团队梳理出软件开源必须注意的8个方面3、PHP根据抖音的分享链接来抓包抖音视频4、第

中兴u930一键root文章列表:

1、看完这篇你一定能掌握Linux非广告
2、82天突破1000star，项目团队梳理出软件开源必须注意的8个方面
3、PHP根据抖音的分享链接来抓包抖音视频
4、第四届黑龙江绿博会四大亮点抢先看
5、米格-29战斗机

看完这篇你一定能掌握Linux非广告

1. Linux命令概览

这部分是给稍微有点Linux经验的同学准备的，如果你是初学者，请跳过此part直接进入第二部分。

1.1目录操作

工作中，最常打交道的就是对目录和文件的操作。linux提供了相应的命令去操作他，并将这些命令抽象、缩写。

1.1.1 基本操作

可能是这些命令太常用了，多打一个字符都是罪过。所以它们都很短，不用阿拉伯数字，一个剪刀手就能数过来。

看命令。
mkdir 创建目录 make dircp 拷贝文件 copymv 移动文件 moverm 删除文件 remove

例子：

# 创建目录和父目录a,b,c,dmkdir -p a/b/c/d# 拷贝文件夹a到/tmp目录cp -rvf a/ /tmp/# 移动文件a到/tmp目录，并重命名为bmv -vf a /tmp/b# 删除机器上的所有文件rm -rvf /

1.1.2 漫游

linux上是黑漆漆的命令行，依然要面临人生三问：我是谁？我在哪？我要去何方？

ls 命令能够看到当前目录的所有内容。ls -l能够看到更多信息，判断你是谁。
pwd 命令能够看到当前终端所在的目录。告诉你你在哪。
cd 假如你去错了地方，cd命令能够切换到对的目录。
find find命令通过筛选一些条件，能够找到已经被遗忘的文件。

至于要去何方，可能就是主宰者的意志了。

1.2 文本处理

这是是非常非常加分的技能。get到之后，也能节省更多时间来研究面向对象。小姐姐味道已经输出了“最常用的vim、sed、awk技巧系列”。

1.2.1 查看文件

cat
最常用的就是cat命令了，注意，如果文件很大的话，cat命令的输出结果会疯狂在终端上输出，可以多次按ctrl c终止。

# 查看文件大小du -h file# 查看文件内容cat file

less
既然cat有这个问题，针对比较大的文件，我们就可以使用less命令打开某个文件。类似vim，less可以在输入/后进入查找模式，然后按n(N)向下(上)查找。
有许多操作，都和vim类似，你可以类比看下。

tail
大多数做服务端开发的同学，都了解这么命令。比如，查看nginx的滚动日志。

tail -f access.log

tail命令可以静态的查看某个文件的最后n行，与之对应的，head命令查看文件头n行。但head没有滚动功能，就像尾巴是往外长的，不会反着往里长。

tail -n100 access.loghead -n100 access.log

1.2.1 统计

sort和uniq经常配对使用。sort可以使用-t指定分隔符，使用-k指定要排序的列。

下面这个命令输出nginx日志的ip和每个ip的pv，pv最高的前10

# 2019-06-26T10:01:57 08:00|nginx001.server.ops.pro.dc|100.116.222.80|10.31.150.232:41021|0.014|0.011|0.000|200|200|273|-|/visit|sign=91CD1988CE8B313B8A0454A4BBE930DF|-|-|http|POST|112.4.238.213awk -F"|" '{print $3}' access.log | sort | uniq -c | sort -nk1 -r | head -n10

1.2.3 其他

grep
grep用来对内容进行过滤，带上--color参数，可以在支持的终端可以打印彩色，参数n则输出具体的行数，用来快速定位。
比如：查看nginx日志中的POST请求。

grep -rn --color POST access.log

推荐每次都使用这样的参数。

如果我想要看某个异常前后相关的内容，就可以使用ABC参数。它们是几个单词的缩写，经常被使用。A after 内容后n行B before 内容前n行C count? 内容前后n行
就像是这样：

grep -rn --color Exception -A10 -B2 error.log

diff

diff命令用来比较两个文件是否的差异。当然，在ide中都提供了这个功能，diff只是命令行下的原始折衷。对了，diff和patch还是一些平台源码的打补丁方式，你要是不用，就pass吧。

1.3压缩

为了减小传输文件的大小，一般都开启压缩。linux下常见的压缩文件有tar、bzip2、zip、rar等，7z这种用的相对较少。

.tar 使用tar命令压缩或解压
.bz2 使用bzip2命令操作
.gz 使用gzip命令操作
.zip 使用unzip命令解压
.rar 使用unrar命令解压

最常用的就是.tar.gz文件格式了。其实是经过了tar打包后，再使用gzip压缩。

创建压缩文件

tar cvfz archive.tar.gz dir/

解压

tar xvfz. archive.tar.gz

快去弄清楚它们的关系吧。

1.4 日常运维

开机是按一下启动按钮，关机总不至于是长按启动按钮吧。对了，是shutdown命令，不过一般也没权限-.-!。passwd命令可以用来修改密码，这个权限还是可以有的。

mount
mount命令可以挂在一些外接设备，比如u盘，比如iso，比如刚申请的ssd。可以放心的看小电影了。

mount /dev/sdb1 /xiaodianying

chown
chown 用来改变文件的所属用户和所属组。
chmod 用来改变文件的访问权限。

这两个命令，都和linux的文件权限777有关。
示例：

# 毁灭性的命令chmod 000 -R /# 修改a目录的用户和组为 xjjchown -R xjj:xjj a# 给a.sh文件增加执行权限（这个太常用了)chmod a x a.sh

yum
假定你用的是CentOS，则包管理工具就是yum。如果你的系统没有wget命令，就可以使用如下命令进行安装。

yum install wget -y

systemctl
当然，centos管理后台服务也有一些套路。service命令就是。systemctl兼容了service命令，我们看一下怎么重启mysql服务。推荐用下面这个。

service mysql restartsystemctl restart mysqld

对于普通的进程，就要使用kill命令进行更加详细的控制了。kill命令有很多信号，如果你在用kill -9，你一定想要了解kill -15以及kill -3的区别和用途。

su
su用来切换用户。比如你现在是root，想要用xjj用户做一些勾当，就可以使用su切换。

su xjjsu - xjj

-可以让你干净纯洁的降临另一个账号，不出意外，推荐。

1.5 系统状态概览

登陆一台linux机器，有些命令能够帮助你快速找到问题。这些命令涵盖内存、cpu、网络、io、磁盘等。

unameuname命令可以输出当前的内核信息，让你了解到用的是什么机器。

uname -a

ps
ps命令能够看到进程/线程状态。和top有些内容重叠，常用。

# 找到java进程ps -ef|grep java

top系统状态一览，主要查看。cpu load负载、cpu占用率。使用内存或者cpu最高的一些进程。下面这个命令可以查看某个进程中的线程状态。

top -H -p pid

free
top也能看内存，但不友好，free是专门用来查看内存的。包括物理内存和虚拟内存swap。

df
df命令用来查看系统中磁盘的使用量，用来查看磁盘是否已经到达上限。参数h可以以友好的方式进行展示。

df -h

ifconfig
查看ip地址，不啰嗦，替代品是ip addr命令。

ping
至于网络通不通，可以使用ping来探测。（不包括那些禁ping的网站）

netstat虽然ss命令可以替代netstat了，但现实中netstat仍然用的更广泛一些。比如，查看当前的所有tcp连接。

netstat -ant

此命令，在找一些本地起了什么端口之类的问题上，作用很大。

1.6 工作常用

还有一些在工作中经常会用到的命令，它们的出现频率是非常高的，都是些熟面孔。

export
很多安装了jdk的同学找不到java命令，export就可以帮你办到它。export用来设定一些环境变量，env命令能看到当前系统中所有的环境变量。比如，下面设置的就是jdk的。

export PATH=$PATH:/home/xjj/jdk/bin

有时候，你想要知道所执行命令的具体路径。那么就可以使用whereis命令，我是假定了你装了多个版本的jdk。

crontab
这就是linux本地的job工具。不是分布式的，你要不是运维，就不要用了。比如，每10分钟提醒喝茶上厕所。

*/10 * * * * /home/xjj/wc10min

datedate命令用来输出当前的系统时间，可以使用-s参数指定输出格式。但设置时间涉及到设置硬件，所以有另外一个命令叫做hwclock。

xargsxargs读取输入源，然后逐行处理。这个命令非常有用。举个栗子，删除目录中的所有class文件。

find . | grep .class$ | xargs rm -rvf#把所有的rmvb文件拷贝到目录ls *.rmvb | xargs -n1 -i cp {} /mount/xiaodianying

1.7 网络

linux是一个多作业的网络操作系统，所以网络命令有很多很多。工作中，最常和这些打交道。

ssh
这个，就不啰嗦了。你一定希望了解ssh隧道是什么。你要是想要详细的输出过程，记得加参数-v。

scp
scp用来进行文件传输。也可以用来传输目录。也有更高级的sftp命令。

scp a.txt 192.168.0.12:/tmp/a.txtscp -r a_dir 192.168.0.12:/tmp/

wget
你想要在服务器上安装jdk，不会先在本地下载下来，然后使用scp传到服务器上吧（有时候不得不这样）。wget命令可以让你直接使用命令行下载文件，并支持断点续传。

wget -c http://oracle.fuck/jdk2019.bin

mysql
mysql应用广泛，并不是每个人都有条件用上navicat的。你需要了解mysql的连接方式和基本的操作，在异常情况下才能游刃有余。

mysql -u root -p -h 192.168.1.2

不要觉得复杂，命令是有限的，但激情无限；都会也不要骄傲，一个vim就够折腾一辈子。捷径就是总结，深入只有探索。白马过隙，终会行云流水，手到擒来。
物是人非，年华易老。唯有时光，不会辜负。

2. 挑选一个Linux发行版

和Linux比较像的还有Unix，但如果你是一个二三十岁的小青年，你接触到可能只有Linux的世界了。从手机，到服务器上广泛使用的centos，到漂亮的桌面发行版ubuntu，甚至是风靡全球的树莓派，到处都是linux的身影。

2.1 你需要知道这些linux历史

知道一点相关操作系统的历史，是能够陶冶情操的。GNU/Linux是为了抵制一些商业公司的垄断行为而发展起来的，凝结了一代互联网人向往自由的心血。

和其他Unix比起来，Linux其实很年轻。直到1991年，一个叫Linus Torvalds的芬兰年轻人才开始开发我们现在所知道的Linux内核。

Linux的吉祥物是企鹅，这个吉祥物直到1996年才确定，所以你会经常看到一些搞笑的图片。如果你是90后，那这只小企鹅几乎和你一般大，还是个年轻的小伙。

Linux的发展历程比较的复杂。经过一次次的过关斩将，Linux走到今天确属不易。关于其发展历史，你可以通过下面的链接，查看高清图片。20年的时间，对软件行业来说，是一段非常漫长的时光，有多少的风光已经物是人非。

高清见图片(http://1t.click/aUnx) 。可以看到，linux只占了那可怜的一小块。这就像人类的出现，在生命的长河中，微不足道，但却是一个质的飞跃。

你可能注意到，在前面的描述中，说的是GUN/Linux，而不仅仅是Linux。Linux本身只是一个内核，作用有限，只有和GNU联合起来，拥有完整的生态才会发挥它的作用。

谈到上面区别的原因，是为了记住Richard Stallman在1983年发起的GNU计划。他同时是smalltalk语言的发明者，被公认的第二个面向对象的语言。我在早些年，还研究过一段时间。哦，他还编写了一个巨无霸编辑器，Emacs。

只有一个人被捧成神，他才会有能量折磨你。

针对于Linux历史，我们不做过多介绍。下面介绍几个经典的发行版本。

2.2 精选版本介绍

现在的Linux发行版本，已经有上千个，你要是喜欢、而且多金，你也可以做一个。如何在这其中，找到最合适的版本，是需要经过一番折腾的。很多发行版本，其实是很小众的。

这不像是哲学领域的某些东西，真理掌握在少数人手中。只有获得良好发展，并得到认可的Linux发行版，才有它的价值，可以说是彻头彻尾的实用主义。

但这东西又像女朋友，刚开始感觉风采迥异，各有千秋，到最后了解到是一样的庸俗不堪。但有人就是喜欢Linux相关的工作，一干就是一辈子...

我可以先说一下自己的历程。刚开始，接触的是红帽redhat，当时还没有分什么企业版。用了一段时间以后，又切换成更稳定的slackware。但是slackware上的程序更新实在太慢了，于是又切换成readhat血统的fedora，这个版本的软件保鲜度很高。其间，又尝试了其他几个linux版本，最终，在2013年前后，换成了滚动升级的archlinux，直到现在。

要我个人做个推荐的话：
1、个人用户（技术），桌面版用ubuntu=>archlinux。
2、企业用户，服务器，使用centos。

2.3 主要起源

这么多Linux版本，其实有两条主线。debian系列和redhat系列。很多发行版本，其实是二次翻新，很多就直接拿这两个基础系列进行改造。正所谓：操作系统千千万，都是帽子和大便。

debian

下面这个屎一样的图表，就是debian。呃呃呃，和大便只差一个字母。

Debian计划是一个致力于创建一个自由操作系统的合作组织。它的特点是：稳定、安全，到现在为止，已经发展了20多年了。我们所熟悉的ubuntu，就是基于debian改进的。

redhat

红帽是一家商业公司，涉足Linux比较早，现在对个人提供一些红帽认证之类的证书。现在云主机使用较多的centos，包括红帽公司的RHEL，占据了大部分服务器市场。近期，centos 8推出了centos stream滚动版本，看起来更像是一个正常的操作系统。

2.4 典型版本

我们看一下处于不同层次的几个典型版本。从应用方面来说，linux有桌面、服务器、研究用等用途。

2.4.1、ubuntu

ubuntu的出现，对Linux的推广有不可磨灭的贡献。它是一个易于安装的桌面版本（也有服务器版本），界面非常漂亮。ubuntu是基于debian系统的unstable分支修改的，包管理软件是apt-get。

它的创建者是Mark Shuttleworth，南非企业家，世界上第二名自资的太空游客。我想，无论是太空还是ubuntu，这都是梦想吧。

2.4.2、centos

centos是目前最流行的服务器版本。它是RHEL源代码再编译的产物，主要是为了绕开一些法律问题。在包管理，甚至稳定性上，与红帽企业版没什么差别。

2.4.3、archlinux

archlinux采用滚动升级的模式进行发行，尽全力提供最新的稳定版本。刚开始安装，arch只提供一个基本的系统，甚至连界面都没有，对初学者不是很友好。

但是，archlinux是非常干净的系统。很多软件，只有你需要的时候才会安装。它的软件和理念通常都是最新的，定制化非常强，深得许多Linux爱好者的喜爱。

2.4.4、gentoo

上面的archlinux，提供了编译后的软件包。用户在安装软件时，只需要下载、解压即可。gentoo将这个过程更近一步，可以说更加的变态。它下载的是软件的源代码，然后在本地进行编译，然后安装。

这通常非常的蛋疼，因为下载、编译会花费非常长的时间，但它有一个非常大的优点，就是稳定。

这个系统比较底层，对技能要求更多，不太推荐。

2.4.5 、LFS

LFS的全拼是“linux from scratch”，意思是从零构建一个linux系统。它有一个非常详细的安装文档，教你怎样编译内核，编译引导程序，编译和配置必要的软件。

这是一个疯狂而必要的过程。如果你想要自己的Linux之上更上层楼，跟着文档做一遍是受益无穷的。你需要经过多次交叉编译，最终使用chroot命令切换到新系统进行后续操作。

想做一个自己的发行版么？从这开始吧。

2.4.6、kali

kali linux是一个非常专业的发行版。如果你在做渗透方面的工作，将是一个非常好的选择。

发行版的安装包非常大，包含了常见的破解工具，渗透工具，攻击工具。这非常的危险，我曾用它暴力破解了非常多的wifi密码，成功的窥视了邻居的隐私。还是非常好用的。

3. 安装一个清爽的Linux系统

工欲善其事，必先利其器。你可能会想到买一台云主机练练手，但那毕竟要花点银子，我们可以自己安装一个。我们在上面提到，目前使用最广泛的，就是centos。不论你是自建机房，还是使用类似于阿里云这样的云端环境，大多数都会提供centos的安装。

你可能会找到多种安装虚拟机的方式。本小节，将使用虚拟双网卡的方式，准备一个纯洁的环境。这一小节，图片很多。

以阿里云为例，默认第一位就是CentOS，提供了从7.6版本到旧版本的多个镜像。

3.1 下载

下面的文章，我们就以CentOS 7稳定版本为基础环境。centos很流行，所以镜像也有很多。国内，我们从上海交大下载，速度应该会快一些。

http://ftp.sjtu.edu.cn/centos/7/isos/x86_64/CentOS-7-x86_64-Minimal-1908.iso

如果交大哪天不维护了。可以从这里找：

http://centos.mirror.ndchost.com/7/isos/x86_64/CentOS-7-x86_64-Minimal-1908.iso

为了让大家学到更多的知识，我们使用最小化的系统ISO。最小化的iso不到1GB，而预装了很多软件的dvd有4.3GB的大小。我们就用这个减肥版。

3.2 安装Linux

要想快速学习、体验Linux，最便捷的方式，就是使用虚拟机进行安装。目前，最流行的虚拟机，一个是VMware，一个是VirtualBox。在MacOS上还有一个Parallels Desktop。

其中，VirtualBox免费而且跨平台，能够满足我们的需求。接下来，将一步步引导你进行安装。

(1) 点击新建，开启安装旅程。

(2) 填写名称，版本。然后点击继续。

(3)按照你的机器配置，选择内存

我的机器是8GB内存的，就分配给虚拟机2GB，这个已经足够用了。

(4) 创建一个虚拟磁盘

点击继续后，将弹出一个对话框。我们不用多管，一直点继续，知道对话框消失。这非常的粗暴。

(5) 接下来，点击设置。

(6) 切换到Storage选项，选择我们下载的iso

(7) 点击启动，开始安装。

使用方向键切换，使得高亮聚焦在Install CentOS 7上。点击确定，开始安装。

(8) 弹出一个安装界面

接下来的步骤有点多，如果我们没有特别的介绍，那么直接continue就ok了。

(9) 接下来，配置磁盘分区

依然保持默认，并按按钮Done退出。

(10) 配置用户

linux上默认的用户名为root。接下来我们设置root用户的密码为123456。由于这是一个弱密码，所以需要点击两次确定退出。

(11) 等待安装完毕，进行重启

(12) 安装成功

3.3 联网

这个时候，我们安装的虚拟机，还不能联网，无法把自己的意念传达出去。由于我们没有对虚拟机进行任何设置，所以使用的是默认的NAT模式。

将光标聚焦到命令行窗口，然后输入命令dhclient。等待几秒钟，执行ping baidu.com测试以下网络，可以看到能够正常访问网络了。

上面黑漆漆的窗口，就是我们现在的Linux界面。有人觉得很丑，就像是在玩dos，但像我这种不可救药的人，却觉得格外的亲切。

接下来的命令，我们不会再截图，而使用高亮的代码块表示。为了不至于让人晕头转向，请先看下图。

3.4 外部访问虚拟机

由于NAT模式的特点，我们的虚拟机能够访问外网，但无法被外部发现。酒香竟怕巷子深。为了解决这个问题，我们需要再添加一块网卡。

在做这些更改之前，需要首先关闭虚拟机。可以强制关闭机器，也可以在命令行中输入：

shutdown -h now

虚拟机关闭后，再次点击设置，切换到网络适配器选项卡。如图，添加一个新的网络适配器，适配器类型为Host-only Adapter。通过这块网卡，我们的宿主机就能够访问它了。

再次启动虚拟机，执行dhclient命令后，执行ip addr查看主机的ip地址。可以看到，我们现在有两块网卡，两个ip地址。

记录下192打头的网络地址，我们会使用外部的程序，比如XShell、SecureCRT等，进行连接。比如，我这里的ip地址是：192.168.99.100。不废话，看图。

小提示：关于虚拟网卡的网段，如果有差异。你可以在全局设置里，改成和我一样的。

3.5 远程连接

你可能已经体验到，通过虚拟机自带的命令行界面进行输入，局限性非常大。通过远程连接，可以把终端界面切换到我们熟悉的操作模式，如果能够显示彩色的终端，那再好不过了。下面介绍几个工具，一般的，使用xshell的居多。

Windows

XShell 你可能在公司内，见过你的SRE同事，运指如飞，命令字符如流水一般撒过屏幕。即使非常繁杂，难以记忆的密码，也能瞬间输入。他可能用的就是xshell。

SecureCRT 比较老的一款产品，使用也较多。

MobaXterm MobaXterm就是一单文件纯绿色软件，下载过来exe包直接运行即可，不需要任何的安装过程。

它们都有免费版和专业版之分。无力购买的话，就找找破解版。但是注意，盗版汉化的ssh客户端，有些别有用心的人会在软件中植入木马，窃取你的密码、证书，这种情况已经发生过很多次。

MacOS

对于macos用户来说，简单的很。直接使用iTerm，输入命令行即可。比如使用下面的命令连接我们的机器。

ssh root@192.168.99.100

Linux

唔，你都已经是Linux环境了，还折腾个啥虚拟机呢？直接用吧。

推荐使用XShell、SecureCRT、iTerm等工具，通过ssh进行远程连接。对于一些命令拷贝、验证来说，要方便快捷的多。

4. 对Linux命令行有个初步了解

万事开头难。面对黑漆漆的Linux窗口，要勇敢的走出第一步。不要怕输错了什么，系统健壮的很。命令行通常会拥有比图形界面更高的效率，更加重要的是它可以做自动化之类的小工具，这使得生产力产生质的飞跃。

现在，你已经安装好了centos，并远程连接上了它。我们拥有了它，但并不能了解它的脾气。接下来，让我们进入Linux命令行的世界。和我签订契约吧，少年。

本小节会使用非常详细的演进方式，来看一下一个命令，是怎样生成和执行的。

4.1、简单尝试

好啦，我们现在就在终端里了。什么叫做终端呢？你在很多黑客电影里，看到的黑漆漆的界面就是，它提供了一个可以输入字符串的交互式界面，至于那些闪光的、扫描机一样的东西，是不存在的。

尝试输入些什么吧。比如：jdsjf。

[root@localhost ~]# jdsjf-bash: jdsjf: command not found

我们再次把这张图贴一下。怎么回事？命令的输出翻译成中文，就是“找不到命令”的意思。什么叫命令？就是我们上面随便输入的字符串jdsjf。

然后，我们看下提示中其他一些有用的东西。

↓↓↓↓↓↓

bash 代表的是我们所使用的shell，shell可以认为是一个解释器，将我们的输入解释成一系列可执行的指令。现在的linux发行版，最流行的就是bash解释器，几乎每个都预装了它。

命令找不到，证明我们的字符串bash解释不了。但是，Linux上一些目录里的文件，是可以被默认找到的，这些目录的集合，就叫PATH 。PATH还是一个环境变量，我们可以通过命令查看它的尊容。

[root@localhost ~]# echo $PATH/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin

想要知道系统中有哪些命令，就可以看下上面这些文件夹中，都有哪些文件。文件非常非常之多，但是大部分我们不会接触。所以，xjjdog才会写这么个东西--聚焦那些最常用，最有用的命令，最常用的参数，最有用的场景。

命令输出后，还有一些额外的东西，比如[root@localhost ~]，这部分叫做提示符，光标会一直跳动，等待你的输入。这部分是可以定制的，甚至可以定制的十分漂亮。

4.2、Hello World

到现在为止，我们什么都没得到。按照程序员的想法来说，就要实现一个hello world的程序。在终端shell里，这个过程变得简单，远比写一个java程序简单。

[root@localhost ~]# echo "Hello World"Hello World

如上所示，echo的意思就是输出一些内容。后面的Hello World，就叫做参数，它们之间以空格分隔，可以接受多个参数。

[root@localhost ~]# echo "Hello World" , "Fuck 996"Hello World , Fuck 996

以上命令能够正常运行，证明echo是我们的终端能够认识的一个命令。那到底这个命令是在什么地方呢？可以使用whereis命令进行查找。

[root@localhost ~]# whereis echoecho: /usr/bin/echo /usr/share/man/man1/echo.1.gz

命令显示。我们的echo命令全路径，是/usr/bin/echo，由于它处于PATH目录中，所以能够被识别到。

4.3、将命令加入PATH

接下来，我们把上面的命令，做成一个脚本。然后将这个脚本，放到PATH目录中。不过先等等，我们要先给命令起个名字。

首先需要创建一个文件。在Linux上，创建文件使用touch命令。

[root@localhost ~]# touch jdsjf

命令执行后，什么都没发生，它只是创建了一个空文件。接下来，我们向其中添加一些内容。

[root@localhost ~]# echo "echo 'Hello World'" > jdsjf

注意符号>，它的意思是，将前面的输出，重定向到后面的文件中。执行完上面的命令，jdsjf 中的内容，就变成了echo 'Hello World。

接下来，我们尝试着去执行刚才生成的命令。

[root@localhost ~]# ./jdsjf-bash: ./jdsjf : Permission denied

我们通过相对路径的方式，来执行刚刚生成的命令。结果，终端显示我们并没有这个命令的执行权限。

其实，Linux在权限控制这一方面，非常的详细。一个文件，有可读、可写、可执行三种属性。如果想要一个文件能够执行，需要给它添加执行权限，这个过程是由命令chmod完成的。

[root@localhost ~]# chmod u x jdsjf[root@localhost ~]# ./jdsjfHello World

我们将在后面的章节，来详细介绍权限方面的知识。如上所示，命令已经能正常输出，接下来，我们把命令移动到PATH中的一个目录。

[root@localhost ~]# mv jdsjf /usr/local/bin/[root@localhost ~]# jdsjfHello World

不需要加任何的相对路径，现在，只需要输入jdsjf，就可以正常输出一串数字。我们成功的让一个没有任何意义的字符串，表达了它的想法。虽然我们依然是它的主宰。

你可以想一下下面这三个问题：

1、我可以自定义一个目录，比如/root/mybin，把它加入到PATH么？

2、我可以省略上面的touch命令，直接使用重定向生成文件么？

3、除了放到PATH和相对路径，还有没有其他的命令执行方式？

5. Linux漫游方式

想要了解linux的基本使用方法，就要了解一个基本的事实--linux系统中，一切皆文件。

不管是命令，还是文档，甚至设备，目录，套接字，在linux上对它们的操作都是一致对待的。许多开发驱动程序的小伙伴，会发现使用的一些函数，和读写文件的没什么两样（open、close、read、write、ioctl）。今天我们所说的基本操作，针对的就是普通文件和目录，本小节将详细解释相关命令。

5.1、当前路径

到现在为止，我们还不知道自己在系统的什么地方。在浏览器上，我们能够通过导航栏上的url，了解到自己在互联网上的具体坐标。相似的功能，是由pwd命令提供的，它能够输出当前的工作目录。

pwd命令是非常非常常用的命令，尤其是在一些命令提示符设置不太友好的机器上。另外，它也经常用在shell脚本中，用来判断当前的运行目录是否符合需求。

有很多线上事故，都是由于没有确认当前目录所引起的。比如rm -rf *这种危险的命令。在执行一些高危命令时，随时确认当前目录，是个好的习惯。

[root@localhost ~]# pwd/root

我们使用root用户默认登陆后，就停留在/root目录中。Linux中的目录层次，是通过/进行划分的。

5.2、文件系统用户标准

Linux的文件系统，从一开始就有一个规范标准。它还有一个专有缩写名词，叫做FHS (Filesystem Hierarchy Standard)。FHS经过多年的演进，目录结构也越来越清晰。除了一些标准的要求，还有一些使用者之间的约定。

接下来，我们大体看一下linux上的默认目录，对其有一个基本的感觉。

第1层第二层介绍/bin
目录/usr/bin的软链接/sbin
目录/usr/sbin的软链接/lib
目录/usr/lib的软链接/usr/bin存放一些常用的命令/usr/sbin存放一些管理员常用的命令/usr/lib用来存放动态库和一些模块文件/sys
内核中的数据结构的可视化接口/proc
内存映像/run
内存映像/boot
存放引导程序，内核相关文件/dev
存放一些设备文件，比如光盘/etc
用于存储一些全局的、应用的配置文件/var
与/var/run一样，存放的是系统运行时需要的文件，比如mysql的pid等/tmp
非常特殊的临时文件夹，断电丢失/home/**用户目录，比如我的目录是/home/xjjdog/root
root用户的home目录

home 平常，我们打交道最多的目录，就集中在自己的用户目录，我们可以在里面做任何操作，比如我们现在root用户的/root目录。一些自己的资料，比如视频、音频、下载的文件，或者做测试用的一些数据资料，就可以自行在这些目录下规划。root用户比较特殊，普通用户的私人目录都是在/home下的。

/etc etc目录是经常要打交道的目录，存放了一些全局的系统配置文件和应用配置文件。比如你安装了php，或者nginx，它们的配置文件就躺在/etc目录下的某个文件夹里。

/var var目录存放一些运行中的数据，有必须的，也有非必须的。一些黑客入侵之后，会在这里面的某些文件中留下痕迹，他们会着重进行清理。var目录还是一些应用程序的默认数据存放之地，比如mysql的数据文件。

/tmp 目录是一个特殊的临时目录，文件在断电以后就消失了。但这个目录，所有的用户，都有写入权限，通常用来做文件交换用。

/proc和/sys目录，是两个神奇的目录。它们两个是一种伪文件系统，可以通过修改其中一些文件的状态和内容，来控制程序的行为（修改后会直接刷到内存上，太酷了）。刚开始的时候，只有proc目录，由于里面内容有多又乱，后面又规划出sys目录，用来控制内核的一些行为。如果你在调优一些系统参数，和这些文件打交道的时间比较多。

还有几个空的目录，我们没有列在上面的表格上。比如/srv目录，通常会把一些web服务的资料，比如nginx的，放在这里面。但是，这并不是强制要求的，所以我见过的/srv目录，通常会一直是空的。同样的，/opt目录也是这样一个存在，你就当它不存在就行。这都属于使用者规划的范畴，自定义性非常强。

在使用Linux系统的时候，也可以创建自己的目录。比如，我就喜欢自己创建一个叫做/data的目录，用来存放一些数据库相关的内容。举个例子，/data/mysql存放mariadb的数据，而/data/es/存放elasticsearch的索引内容。

linux上的文件类型有很多，它们大部分都分门别类的存放在相应的目录中，比如/dev目录下，就是一些设备文件；/bin文件下，是一些可以执行命令。通常都好记的很。

5.3、查看文件列表

所以，上面的表格内容，我是怎么看到的呢，靠记忆么？ls命令，能够列出相关目录的文件信息。可以被评为linux下最勤劳的命令标兵。

现在的终端，都能够输出彩色的信息，非常的直观。oh-my-zsh和oh-my-bash等项目，可以让你的终端更加的漂亮。把它加入到你的研究清单里吧。

[root@localhost /]# ls /# 注意：ls可以接受路径参数，你不用先跳转，就可以输出相关信息bin boot dev etc home lib lib64 media mnt opt proc root run sbin srv sys tmp usr var[root@localhost /]# ls -l /# 带上 -l参数，可以查看一些更加详细的信息。total 20lrwxrwxrwx. 1 root root 7 Nov 3 20:24 bin -> usr/bindr-xr-xr-x. 5 root root 4096 Nov 3 20:34 bootdrwxr-xr-x. 19 root root 3080 Nov 3 21:19 devdrwxr-xr-x. 74 root root 8192 Nov 3 20:34 etcdrwxr-xr-x. 2 root root 6 Apr 11 2018 homelrwxrwxrwx. 1 root root 7 Nov 3 20:24 lib -> usr/liblrwxrwxrwx. 1 root root 9 Nov 3 20:24 lib64 -> usr/lib64drwxr-xr-x. 2 root root 6 Apr 11 2018 mediadrwxr-xr-x. 2 root root 6 Apr 11 2018 mntdrwxr-xr-x. 2 root root 6 Apr 11 2018 optdr-xr-xr-x. 108 root root 0 Nov 3 21:19 procdr-xr-x---. 2 root root 135 Nov 4 07:53 rootdrwxr-xr-x. 24 root root 740 Nov 3 21:20 runlrwxrwxrwx. 1 root root 8 Nov 3 20:24 sbin -> usr/sbindrwxr-xr-x. 2 root root 6 Apr 11 2018 srvdr-xr-xr-x. 13 root root 0 Nov 3 21:19 sysdrwxrwxrwt. 9 root root 4096 Nov 4 03:40 tmpdrwxr-xr-x. 13 root root 155 Nov 3 20:24 usrdrwxr-xr-x. 19 root root 267 Nov 3 20:34 var

ls最常用的，就是加参数l或者参数a。

5.3.1、详细信息

加上参数l，能够看到文件的一些权限信息已经更新日期等。但我们还看到了一些更有意思的东西。比如：

lib -> usr/lib

上面表示的，是软链接信息。

就如同我们上面表格所展示的一样，lib目录，是/usr/lib的快捷方式，它们之中的内容，没有什么两样。

关于ls -l展示的更加详细的内容，可以参照我下面的这张图。我们将在了解后面小节的内容后，再次对这张图进行回顾。

5.3.2 隐藏文件

直接在你的/root目录里，执行ls -al，你会看到更多东西。这些额外的隐藏文件，都是以.开头，以配置文件居多。这就是参数a的作用。

[root@localhost ~]# ls -altotal 28dr-xr-x---. 2 root root 135 Nov 4 07:53 .dr-xr-xr-x. 17 root root 224 Nov 3 20:28 ..-rw-------. 1 root root 1273 Nov 3 20:28 anaconda-ks.cfg-rw-------. 1 root root 246 Nov 4 11:41 .bash_history-rw-r--r--. 1 root root 18 Dec 28 2013 .bash_logout-rw-r--r--. 1 root root 176 Dec 28 2013 .bash_profile-rw-r--r--. 1 root root 176 Dec 28 2013 .bashrc-rw-r--r--. 1 root root 100 Dec 28 2013 .cshrc-rw-r--r--. 1 root root 129 Dec 28 2013 .tcshrc

细心的同学，应该会注意到两个特殊的目录。.和..。前者表示的是当前目录，而后者表示的是上层目录。

使用cd命令，将在这些目录中，自由穿梭。

小技巧：如果你对英文日期阅读困难，可以使用ls -al --full-time查看可读的日期。

5.4、切换目录

执行cd命令，可以将工作目录切换到目标文件夹。为了展示cd命令的效果。请在root用户下，执行下面的命令，这将创建一个7层的目录。

cdmkdir -p a1/b2/c3/d4/e5/f6/{g7,g8,g9,g10}

我们使用cd命令，切换到最后一层。然后，我们使用..切换到上层目录。

[root@localhost ~]# cd a1/b2/c3/d4/e5/f6/g7[root@localhost g7]# pwd/root/a1/b2/c3/d4/e5/f6/g7[root@localhost g7]# cd ..[root@localhost f6]# pwd/root/a1/b2/c3/d4/e5/f6

所以，切换到上面n层目录，只需使用多层级的../即可。有几个特殊的变量，需要说明一下。

../ 指的是上层目录

../../ 指的是上两层目录

./ 指的是当前目录

~ 指的是当前的用户目录，这是一个缩写符号

- 使用它，可以在最近两次的目录中来回切换

我们来使用命令把上面这些特殊变量验证一下。

# 跳转到用户根目录[root@localhost tmp]# cd ~[root@localhost ~]# pwd/root# 进入到第三层目录[root@localhost ~]# cd a1/b2/c3/[root@localhost c3]# pwd/root/a1/b2/c3# 跳回到前三层目录[root@localhost c3]# cd ../../..[root@localhost ~]# pwd/root# 跳到上次访问的目录[root@localhost ~]# cd -/root/a1/b2/c3[root@localhost c3]# pwd/root/a1/b2/c3# 进入当前目录：等于什么都没干[root@localhost c3]# cd ./[root@localhost c3]# pwd/root/a1/b2/c3

以上就是cd命令的常用用法。现在，我们返回头来看一下mkdir。顾名思义，就是创建目录的意思，但一般在工作中，都会加上-p参数，这样就可以一次性创建多层目录。注意mkdir后面的大括号{}，可以一次性的指定多个目录进行创建，这通常能节省很多时间。

5.5、文件操作

使用命令行操作文件，是非常方便的。

touch 新建文件

cp 复制文件

mv 移动文件

rm 删除文件

这四个风骚的命令，主宰着文件资料的去向。我们依然使用上面创建的目录，进行接下来的操作。

# 创建三个文件[root@localhost ~]# touch long-long-long.txt[root@localhost ~]# touch 996.txt[root@localhost ~]# touch icu.txt[root@localhost ~]# ls996.txt a1 anaconda-ks.cfg icu.txt long-long-long.txt# 复制一个文件[root@localhost ~]# cp 996.txt 007.txt[root@localhost ~]# mv long-long-long.txt short.txt[root@localhost ~]# ls007.txt 996.txt a1 anaconda-ks.cfg icu.txt short.txt# 移动996.txt到a1目录，icu.txt到a1/b2目录# 删除short.txt[root@localhost ~]# mv 996.txt a1/[root@localhost ~]# mv icu.txt a1/b2/[root@localhost ~]# rm short.txtrm: remove regular empty file ‘short.txt’? y# 递归删除a1目录[root@localhost ~]# rm -rvf a1/removed directory: ‘a1/b2/c3/d4/e5/f6/g7’removed directory: ‘a1/b2/c3/d4/e5/f6/g8’removed directory: ‘a1/b2/c3/d4/e5/f6/g9’removed directory: ‘a1/b2/c3/d4/e5/f6/g10’removed directory: ‘a1/b2/c3/d4/e5/f6’removed directory: ‘a1/b2/c3/d4/e5’removed directory: ‘a1/b2/c3/d4’removed directory: ‘a1/b2/c3’removed ‘a1/b2/icu.txt’removed directory: ‘a1/b2’removed ‘a1/996.txt’removed directory: ‘a1/’[root@localhost ~]# ls007.txt anaconda-ks.cfg

经过一番操作以后，只剩下了007了。除了上面基本的操作，接下来我要介绍一些更加重要的功能。

可以看到在使用rm删除文件的时候，进行了一次提示。这是为了避免误删除一些东西，但有时候，你需要不显示这种提示，就可以加-f参数。f参数对于cp、mv等命令来说，同样适用，它是force的意思。

rm -f filecp -f file1 file2mv -f file1 file2

另外，还有一个参数-r，这是递归的意思。我们的目录和文件，通常有多个层次，递归可以把操作全部作用于上面，比如上面的递归删除a1目录。

# 警告：以下命令会造成严重后果rm -rf /

上面的这个命令，你一定经常看到。这不是笑话，已经有很多用户因此丢失了数据，这就是传说中的删根，最终你将一无所有。那参数v又是干什么用的呢？加上它之后，可以看到命令详细的执行过程。在平常的操作中，我一般都加上。

6.开始操作文件

你可能已经了解到，ll -l命令的第一列，能够显示linux的文件类型。请对此有一个大体的印象，因为后面的很多命令，会用到这些知识。

- 表示普通文件

d 表示目录文件

l 表示链接文件，比如快捷方式

s 套接字文件

c 字符设备文件，比如/dev/中的很多文件

b 表示块设备文件，比如一些磁盘

p 管道文件

Linux上的文件可以没有后缀，而且可以创建一些违背直觉的文件。比如后缀是png，但它却是一个压缩文件（通常不会这么做）。大学时，就有聪明的同学这样藏小电影，效果很好。

查看文件的具体类型，可以使用file命令，它很聪明，能够识别很多文件格式。

[root@localhost ~]# file /etc/etc: directory[root@localhost ~]# file /etc/group/etc/group: ASCII text[root@localhost ~]# file /dev/log/dev/log: socket[root@localhost ~]# file /dev/log/dev/log: socket[root@localhost ~]# file /bin/bin: symbolic link to `usr/bin'

本部分的操作，面向的就是ASCII text类型的，普通文本文件。接下来，我们要创建一些文件。然后写入一些内容到文件里，以便进行后续的操作。

6.1、创建一个文件

6.1.1、数字序列

使用重定向符，能够直接生成文件。下面，我要生成10到20的数字，每一个数字单独一行，写入一个叫做spring的文件。巧的很，seq命令可以完成这个过程。

seq 10 20 >> spring

我们在前面提到过>的意思，是将前面命令的输出，重定向到其他地方。在这里，我们用了两个>，它依然是重定向的意思，但表示的是，在原来文件的基础上，追加内容。

也就是编程语言里的w 和a 的意思。

6.1.2、查看内容

为了查看文件的生成效果，可以使用cat命令检测。cat命令将会把文件的内容，输出打印到终端上。如果加上参数n，甚至可以打印行号。效果如下：

[root@localhost ~]# cat spring1011121314151617181920[root@localhost ~]# cat -n spring1 102 113 124 135 146 157 168 179 1810 1911 20

除了查看文件内容，cat命令通常用在更多的地方。只有和其他命令联合起来，它才会觉得生活有意义。

# 合并a文件和b文件到c文件cat a b>> c# 把a文件的内容作为输入，使用管道处理。我们在后面介绍cat a | cmd# 写入内容到指定文件。在shell脚本中非常常用。我们在后面会多次用到这种写法cat > index.html <<EOF<html> <head><title></title></head> <body></body></html>EOF

由于我们的文件不大，cat命令没有什么危害。但假如文件有几个GB，使用cat就危险的多，这只叫做猫的小命令，会在终端上疯狂的进行输出，你可以通过多次按ctrl c来终止它。

6.2、平和的查看文件

既然cat命令不适合操作大文件，那一定有替换的方案。less和more就是。由于less的加载速度比more快一些，所以现在一般都使用less。它最主要的用途，是用来分页浏览文件内容，并提供一些快速查找的方式。less是一个交互式的命令，你需要使用一些快捷键来控制它。

这次我们使用seq生成一千万行记录，足足有76MB大小，然后用less打开它。

[root@localhost ~]# seq 10000000 > spring[root@localhost ~]# du -h spring76M spring[root@localhost ~]# less spring

关于less，一般操作如下：

空格向下滚屏翻页

b 向上滚屏翻页

/ 进入查找模式，比如/1111将查找1111字样

q 退出less

g 到开头

G 去结尾

j 向下滚动

k 向上滚动，这两个按键和vim的作用非常像

6.3、文件头尾

head可以显示文件头，tail可以显示文件尾。它们都可以通过参数-n，来指定相应的行数。

[root@localhost ~]# head -n 3 spring123[root@localhost ~]# tail -n 3 spring9999998999999910000000

对于部分程序员来说，tail -f或许是最常用的命令之一。它可以在控制终端，实时监控文件的变化，来看一些滚动日志。比如查看nginx或者tomcat日志等等。通常情况下，日志滚动的过快，依然会造成一些困扰，需要配合grep命令达到过滤效果。

# 滚动查看系统日志tail -f /var/log/messages# 滚动查看包含info字样的日志信息tail -f /var/log/messages | grep info

对于tail命令来说，还有一个大写的参数F。这个参数，能够监控到重新创建的文件。比如像一些log4j等日志是按天滚动的，tail -f无法监控到这种变化。

6.4、查找文件

考虑下面这个场景。我们需要找一个叫做decorator.py的文件，这个文件是个幽灵，可能存在于系统的任何地方。find命令，能够胜任这次捉鬼行动。

我们使用find命令，从根目录找起，由于系统的文件过多，下面的命令可能会花费一段时间。

[root@localhost site-packages]# find / -name decorator.py -type f/usr/lib/python2.7/site-packages/decorator.py

使用time命令，可以看到具体的执行时间。执行还是挺快的么！秒出！

[root@localhost site-packages]# time find / -name decorator.py -type f/usr/lib/python2.7/site-packages/decorator.pyreal 0m0.228suser 0m0.098ssys 0m0.111s

find命令会查出一个路径的集合。通常是查询出来之后，进行额外的处理操作，一般配合xargs命令使用（xargs读取输入，然后逐行处理），至于find的exec参数？忘了它吧，不好用！

# 删除当前目录中的所有class文件find . | grep .class$ | xargs rm -rvf# 找到/root下一天前访问的文件，type后面的类型参见文章开头find /root -atime 1 -type f# 查找10分钟内更新过的文件find /root -cmin -10# 找到归属于root用户的文件find /root -user root# 找到大于1MB的文件，进行清理find /root -size 1024k -type f | xargs rm -f

find的参数非常非常多，记不住怎么办？除了常用的，其实都可以通过man命令查看。man的操作也和vi非常的类似，输入/EXAMPLES，会看到很多样例。不过我觉得还是上面列出的这些命令更加的适用。

6.4.1、数据来源

在上图中，你会看到mtime,ctime,atime类似的字样，它们的数据来自于何处呢？接下来我们顺理成章的看一下stat命令。

[root@localhost ~]# stat spring File: ‘spring’ Size: 78888897 Blocks: 154080 IO Block: 4096 regular fileDevice: fd00h/64768d Inode: 8409203 Links: 1Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)Context: unconfined_u:object_r:admin_home_t:s0Access: 2019-11-04 18:01:46.698635718 -0500Modify: 2019-11-04 17:59:38.823458157 -0500Change: 2019-11-04 17:59:38.823458157 -0500 Birth: -

这不就是文件属性么？从文件大小，到文件类型，甚至到最后修改、访问时间，都可以从这里获取。Linux文件系统以块为单位存储信息，为了找到某一个文件所在存储空间的位置，会用i节点(inode) 对每个文件进行索引，你可以认为它是一个文件指针。

文件的字节数

文件拥有者user

文件所属组group

文件的读、写、执行权限

文件的时间戳

ctime指inode上一次变动的时间

mtime指文件内容上一次变动的时间

atime指文件上一次打开的时间。

链接数，即有多少文件名指向这个inode （ln命令）

文件数据block的位置（具体的数据位置）

关于inode是一个比较大的话题，也是比较重要的知识点，有兴趣的可以自行搜索。我们只需要知道这些信息是从这里来的就可以了。

6.4.2、小练习

如果我只想获取Modify这个数值，可以组合使用一下上面学到的命令。首先获取最后三行，然后获取首行。效果如下：

[root@localhost ~]# stat spring | tail -n 3 | head -n 1Modify: 2019-11-04 17:59:38.823458157 -0500

下面几个命令，效果是与上面等价的，输出结果也是一模一样。正所谓条条大路通罗马，接下来，我们首先介绍一下出现频率较高的grep。另外，我们在上面的这些命令中，多次使用了|，这是Linux中非常重要的管道概念，下面也会着重介绍。

6.5、字符串匹配

grep用来对内容进行过滤，带上--color参数，可以在支持的终端可以打印彩色，参数n则用来输出具体的行数，用来快速定位。这是一个必须要熟练使用的命令。

比如：查看nginx日志中的POST请求。

grep -rn --color POST access.log

推荐每次都使用这样的参数。

如果我想要看某个异常前后相关的内容，就可以使用ABC参数。它们是几个单词的缩写，经常被使用。

A after 内容后n行

B before 内容前n行

C 内容前后n行

就像是这样：

# 查看Exception关键字的前2行和后10行grep -rn --color Exception -A10 -B2 error.log#查找/usr/下所有import关键字，已经它们所在的文件和行数grep -rn --color import /usr/

6.6、管道

在上面的命令中，我们多次用到了|，这貌似可以完成一些神奇的事情。|是pipe的意思，它可以把多个命令联系起来。通常，命令有下面的关联方式：

; 顺序执行，如mkdir a;rmdir a

&& 条件执行，如mkdir a && rmdir a

|| 条件执行，如mkdir a || rmdir a，后面的命令将不执行

| 管道，前面命令的输出，将作为后面命令的输入

前三种的命令关联，是非常简单有逻辑的，非常的好理解。而管道，却有自己的特点。

接触过编程语言的都知道stdin、stdout、stderr的概念。让我们重新组织一下针对于管道的定义：前面命令的输出(stdin)，将作为后面命令的输入(stdout)。

我们拿一行命令来说明。

seq 20 100 | head -n 50 | tail -n 1

上面命令，将输出69。69是个神奇的数字，它是怎么办到的呢？我们来一张小图，一切就豁然开朗了。

关于输入输出和错误，linux使用一个数字进行缩写，这在一些脚本中，甚至在一些安装文件中，会经常用到。

0 表示stdin标准输入

1 表示stdout标准输出

2 表示stderr标准错误

通过类似2>&1的语法，可以把错误信息定向到标准输出。我们用命令来证明一下。

# 错误信息无法输出到文件[root@localhost ~]# cat aaaaaaaaa > bcat: aaaaaaaaa: No such file or directory[root@localhost ~]# cat b# 错误信息被重定向了[root@localhost ~]# cat aaaaaaaaa > b 2>&1[root@localhost ~]# cat bcat: aaaaaaaaa: No such file or directory

6.7、排序

在了解管道的工作原理之后，就可以介绍一下sort命令了。它通常可以和uniq（去重）命令联合，完成一些排序、去重的操作。首先使用cat命令，生成如下内容的文件。

cat > sort.txt <<EOF1 113 222 444 335 556 666 66EOF```bash接下来让这两个命令上台表演一下。sort可以使用-t指定分隔符，使用-k指定要排序的列。但是空格，是不需要做这些画蛇添足的指定的。```bash# 根据第一列倒序排序[root@localhost ~]# cat sort.txt | sort -n -k1 -r6 666 665 554 333 222 441 11# 统计每一行出现的次数，并根据出现次数倒序排序# 此时，行数由7变成了6[root@localhost ~]# cat sort.txt | sort | uniq -c | sort -n -k1 -r2 6 661 5 551 4 331 3 221 2 441 1 11

注意：uniq命令，一般用在已经经过排序的结果集上。所以，很多情况需要首先使用sort命令进行排序后，再使用uniq命令。新手经常会忘记第一步，造成命令不能正常运行。

6.8、小练习

本部分，我们从文件的属性开始说起，了解了几个对文件操作的常用命令。并顺便介绍了管道的概念。下面，我们来练习一下。

找到系统中所有的grub.cfg文件，并输出它的行数。

分析：首先需要使用find命令，找到这些文件。然后使用xargs逐行处理。最后，使用wc命令，统计确切的行数。

[root@localhost grub2]# find / | grep grub.cfg | xargs wc -l141 /boot/grub2/grub.cfg

输出系统的group列表

cat /etc/group | awk -F ':' '{print $1}'

下面这个命令输出nginx日志的ip和每个ip的pv，pv最高的前10

6.9、思考&扩展

1、Linux的终端，是如何实现彩色的文字的？我要如何输出一个绿色的Hello World?

2、软链接与硬链接有什么区别？

3、了解几个偏门但又不是非常偏的命令。

cut 有了awk，几乎不怎么会用cut了

col

paste

join

split

7. 正则和高级用法

你可能遇到一些棘手的问题，通过搜索得到想要的结果，但下次还是要通过搜索解决问题，这种低效的手段不是我们所想要的。典型的就是一个线上运维工程师，当问题来临时，不会给你留太多的现场学习时间。

为了达到更高效的训练，我们要做两件事情：第一，总结归纳；第二，触类旁通。Linux的命令也是如此，一个问题，通常会有多种解决方式，要通过变化找出其中的共性。

这涉及到一些设计者对于规范约定俗成的遵守。一般的，你只需要掌握一小部分命令，然后对大批命令达到了解的程度，就可以在命令行的世界里游刃有余。举个例子，你知道ls是列出文件目录，你就会联想到lscpu是列出cpu信息；lsmem是列出内存信息;lsblk是磁盘信息等。这种共性很多，比如top系列，stat系列。

7.1、辅助信息

7.1.1、Linux文件格式

在Linux上工作，是非常非常排斥二进制这种格式的，几乎什么都是可以读写的文本内容。大多数命令生成的结果，也都是文本文件。这些文件有一些特点，通常列与列都是通过空格或者<TAB>键分隔的。比如下面lsmem的结果，这种有规律的，有章可循的文件，是非常容易被处理的。

[root@localhost ~]# lsmem RANGE SIZE STATE REMOVABLE BLOCK0x0000000000000000-0x0000000007ffffff 128M online no 00x0000000008000000-0x000000000fffffff 128M online yes 10x0000000010000000-0x0000000017ffffff 128M online no 20x0000000018000000-0x0000000027ffffff 256M online yes 3-40x0000000028000000-0x000000004fffffff 640M online no 5-90x0000000050000000-0x000000005fffffff 256M online yes 10-110x0000000060000000-0x000000007fffffff 512M online no 12-15Memory block size: 128MTotal online memory: 2GTotal offline memory: 0B

有一大批针对于行操作的命令，同样有一批针对于列操作的命令。然后，有两个集大成者，叫做sed、awk。由于这两个命令的内容非常多，我们将其列为单独的章节。

7.1.2、命令记不住怎么办？

通常linux命令都十分简单，但是有些还是有些复杂度的。比如find，ps这种命令，如果要照顾到所有的场合，可能需要非常巨大的篇幅。但是，万一用到这种偏门的场合怎么办？

全面了解一下是非常有必要的，以便在使用的时候能够唤起记忆中最浅显的印象。然后剩下的，就可以交给类似于man的这种命令了。Linux上的每一个命令，都会有配套的帮助文件，这远比网络上那些转来转去的信息，正确的多。

正式介绍一下下面的两个命令：

man 用来显示某个命令的文档信息。比如：man ls

info 你可以认为和man是一样的，虽然有一些能够互补的内容。它们会在内容中进行提示的

--help 很多命令通过参数--help提供非常简短的帮助信息。这通常是最有用最快捷的用例展示。如果你根本就记不住一个非常拗口的单词，那就找找这些地方吧

注意：这些帮助信息，仅集中在命令的作用域本身。对于它的组合使用场景，并没有过多信息。也就是说，它教会了你怎么用，但并没有告诉你用它能够来做什么。

这些帮助命令，一般会通过高亮关键字，增加阅读的体验。但我们可以更近一步，把帮助文件变成彩色的。在root用户下，执行下面的命令。然后，重新登录虚拟机。

cat >> ~/.bashrc <<EOFfunction man(){ env LESS_TERMCAP_mb=$(printf "e[1;31m") LESS_TERMCAP_md=$(printf "e[1;31m") LESS_TERMCAP_me=$(printf "e[0m") LESS_TERMCAP_se=$(printf "e[0m") LESS_TERMCAP_so=$(printf "e[1;44;33m") LESS_TERMCAP_ue=$(printf "e[0m") LESS_TERMCAP_us=$(printf "e[1;32m") man "$@"}EOF

再次执行man命令，就可以看到彩色的信息了。

7.1.3、TAB补全

现在，在终端里，输入ca，然后快速按2次<TAB>键盘，命令行会进入补全模式，显示以ca打头的所有命令。

[root@localhost ~]# cacacertdir_rehash cache_dump cache_repair cache_writeback ca-legacy capsh case catchsegvcache_check cache_metadata_size cache_restore cal caller captoinfo cat catman

如果你对某个命令，只有模糊的印象，只记得前面的几个字母，这个功能是极好的，命令范围会一步步缩减。

7.2、正则表达式

为了开始下面的内容，我们首先介绍一下正则表达式。在前面的一些命令中，也可以使用这些正则表达式，比如less、grep等。

有些书籍，能够把正则表达式写成一本书，我们这里仅作简单的介绍，但足够用了。一般的，正则表达式能用在匹配上，还能够把匹配的内容拿来做二次利用。关于后者，我们在sed命令中介绍。

标志意义^行首$行尾.任意单个字符*匹配0个或者多个前面的字符 1个或者多个匹配?0个或者1个匹配{m}前面的匹配重复m次{m,n}前面的匹配重复m到n次[]匹配一个指定范围内的字符[^]匹配指定范围外的任意单个字符转义字符[0-9]匹配括号中的任何一个字符,or的作用|or，或者b匹配一个单词。比如bluckyb 只匹配单词lucky

使用下面的命令创建一个文件，我们练习一下grep命令加上E参数后的正则表现。

cat > 996 <<EOF996: 996 is a funcking thing . which make woman as man , man as ass .we all on the bus , bus bus on the way . 996way to icu. icuuuuuu......The greedy green boss rides on the pity programmerEOF

在终端执行下面命令，注意高亮的部分即为匹配到的字符串。

# 匹配996开头的行[root@localhost ~]# cat 996 | grep -E ^996996: 996 is a funcking thing . which make woman as man , man as ass .# 匹配996结尾的行[root@localhost ~]# cat 996 | grep -E 996$we all on the bus , bus bus on the way . 996# 匹配到icu和icuuuuuu[root@localhost ~]# cat 996 | grep -E icu way to icu. icuuuuuu......# 再次匹配到996[root@localhost ~]# cat 996 | grep -E [0-9]996: 996 is a funcking thing . which make woman as man , man as ass .we all on the bus , bus bus on the way . 996[root@localhost ~]# cat 996 | grep -E ^[^0-9]we all on the bus , bus bus on the way . 996way to icu. icuuuuuu......The greedy green boss rides on the pity programmer# 匹配所有不包含996的行，良心命令，泪奔[root@localhost ~]# cat 996 | grep -E -v [0-9]{3}way to icu. icuuuuuu......The greedy green boss rides on the pity programmer# 匹配boss和icu[root@localhost ~]# cat 996 | grep -E boss|icuway to icu. icuuuuuu......The greedy green boss rides on the pity programmer# 匹配所有行[root@localhost ~]# cat 996 | grep -E .996: 996 is a funcking thing . which make woman as man , man as ass .we all on the bus , bus bus on the way . 996way to icu. icuuuuuu......The greedy green boss rides on the pity programmer

正则表达式非常的重要，在一些sed脚本中，awk脚本中，甚至是vim编辑器中，都会简化你的操作。以上内容应该熟记，达到不需要查找文档的地步。

下面有6个小问题，可以思考一下。

1、回过头去，执行一下man cat，是否发现了一个叫做tac的命令？它是干什么的？

2、上面提到的stat系列，你能想象iostat大体是干什么用的么？

3、grep -v是什么意思？

4、了解一下和mv非常像的rename命令来批量修改文件，看能否使用上面的正则。

5、有些命令如果拼写错误，如何快速修正？靠搜索么？了解一下fuck命令。我没有说错。

6、下面哪种写法表示如果cmd1成功执行，则执行cmd2命令？

A. cmd1&&cmd2

B. cmd1|cmd2

C. cmd1;cmd2

D. cmd1||cmd2

8. Linux下的压缩

压缩，是一件非常神奇的事情。

很久很久之前，就接触过一些64KB大小的电影，你花半小时都看不完。事实上，这些动画的真实容量是15GB，Warez组织把它压缩了25万倍。

你要是Windows系统，可以在这里下载体验一下。但我们现在讲的是Linux，很打脸是不是？

链接: https://pan.baidu.com/s/12YJQ4jsbtRr7RxoLpARTyQ 提取码: r7sp

压缩是件神奇的事。它能大能小，能伸能缩，在现实中很难找到这样的东西。

为了减小传输文件的大小，或者为了传输方便，一般都会开启压缩。linux下常见的压缩文件有tar、bzip2、zip、rar等，7z这种用的相对较少。压缩之后的文件，非常适合在网络环境上传输。甚至，你可以认为iso文件为一种特殊的压缩方式。

.tar 使用tar命令压缩或解压.bz2 使用bzip2命令操作.gz 使用gzip命令操作.zip 使用unzip命令解压.rar 使用unrar命令解压.Z 使用compress,uncompress

准备工作：使用下面的命令，复制1000个文件。

cd ~mkdir filescd filesseq 1000 | xargs -I {} cp /etc/group {}

使用ls，就可以看到我们刚才创建的1000个文件。接下来，我们使用压缩命令将它打包成一个。

# 查看1000个文件的总大小[root@localhost files]# du -h .4.0M .# 切换到root目录cd ~# 使用tar进行压缩，压缩后的文件不到1MB[root@localhost ~]# tar cvf files.tar files[root@localhost ~]# du -h files.tar1012K files.tar# 使用gizp提高压缩比，压缩后的文件只有12KB[root@localhost ~]# gzip files.tar[root@localhost ~]# du -h files.tar.gz12K files.tar.gz

tar和gzip一般是联合使用的。tar命令提供了一种特殊的功能，就是可以在打包解包的同时调用其他的压缩程序，比如：gzip，bzip2等。

下面的命令，与上面执行两次命令后是等同的。所以，一般使用下面的方式进行操作。

[root@localhost ~]# tar cvfz files2.tar.gz files[root@localhost ~]# du -h files2.tar.gz12K files2.tar.gz

与之对应的，就是解压操作。我们只需要改动命令行中的一个字母即可：c->x。但其实，参数v和z也是可以省略的。

[root@localhost ~]# tar xvfz files2.tar.gz

我们更加常用的方式，是加上参数C，指定一个要解压的目录。比如下面的命令，把压缩内容解压到/opt目录中。

[root@localhost ~]# tar xvfz files2.tar.gz -C /opt/

那如果我仅仅想要看下压缩文件中包含哪些文件呢？这就要使用参数t。

c 压缩

x 解压

t 查看列表

安装其他的

我们来看一下常用的zip和rar解压程序有没有安装。

[root@localhost ~]# which unzip/usr/bin/which: no unzip in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin)[root@localhost ~]# which unrar/usr/bin/which: no unrar in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin)

所以，我们的系统并没有安装这两个应用。那我就使用centos的包管理工具yum安装一下。java中的jar命令也是与zip类似的，可自行探索。

[root@localhost ~]# yum install -y zip unzip rar unrarLoaded plugins: fastestmirrorLoading mirror speeds from cached hostfile * base: mirrors.aliyun.com * extras: mirrors.tuna.tsinghua.edu.cn * updates: mirrors.aliyun.com...

rar不能安装成功，所以rar文件并不能被解压。没关系，我们在后面的章节把它安装上。

现在，你会在Linux安装tomcat了么？

接下来，我们思考一下：

1、经过zip压缩的文件，再使用gzip压缩，容量还会减少么？

为了验证这个过程，可以使用dd命令，生成一个69MB大小的随机文件。dd命令也是个神奇哦。

[root@localhost ~]# dd if=/dev/urandom of=test bs=1M count=6969 0 records in69 0 records out72351744 bytes (72 MB) copied, 0.446161 s, 162 MB/s[root@localhost ~]# du -h test69M test

所以，回到文章最上面，我们可以随机生成一批文件，让压缩效果更有意义一点。

seq 1000 | xargs -i dd if=/dev/zero of={}.xjj bs=1k count=256

2、如果已经有了文件，tar命令是如何做到强制覆盖的？

9. Linux的权限体系

我们在最最最上面，刚接触命令行的时候，就使用chmod命令，给普通文本文件，赋予了执行权限。本小节将看一下用户权限和文件权限这两个息息相关的概念，

9.1、添加用户

到现在为止，我们的系统中，还孤零零的只有这一个用户，是时候学学女娲，捏几个小泥人了。

首先创建两个用户：张三（zhang3）、李四（li4）。

[root@localhost ~]# useradd zhang3

查看下面命令的三个输出结果。

# 系统中多了一个叫做zhang3的组，group文件保存了系统的组信息[root@localhost ~]# tail -n1 /etc/groupzhang3:x:1000:# 系统中多了一个叫做zhang3的用户，shadow文件保存了它们的密码。很多安全渗透就是为了拿到它进行暴力破解[root@localhost ~]# tail -n1 /etc/shadowzhang3:!!:18207:0:99999:7:::# home目录中，多了一个叫做zhang3的目录[root@localhost ~]# ll /home --full-timetotal 0drwx------. 2 zhang3 zhang3 83 2019-11-06 22:09:33.357165082 -0500 zhang3

接下来，给我们刚刚建立的用户，使用passwd设置一个密码。密码需要输入两次进行确认。如果想要更改密码，可以使用chpasswd命令。

[root@localhost ~]# passwd zhang3Changing password for user zhang3.New password:BAD PASSWORD: The password is shorter than 8 charactersRetype new password:passwd: all authentication tokens updated successfully.

那么如何删除一个现有的用户呢？这是通过userdel命令实现的。加上参数f，会在其他用户使用系统的时候，强制退出。

userdel -f zhang3

9.2、文件权限说明

从上面的命令执行结果中，我们发现了有两件非常有意思的东西。添加用户后，除了在密码文件shadow中增加了一些内容，同时还在group文件中添加了信息。这涉及到用户的两个属性：用户名，组名。

一个用户只有一个名称代号，但是可以有多个组。下面命令创建一个密码为123的用户li4，并给它追加一个叫做zhang3的组。可以看到/etc/group文件中的信息变更。

[root@localhost ~]# useradd -G zhang3 -p 123 li4[root@localhost ~]# tail -n 2 /etc/groupzhang3:x:1000:li4li4:x:1001:

好啦，接下来切换到我们的文件权限上面。为了进行下面命令的验证，我们首先创建一个名字叫confirm777.sh的脚本文件。为了让脚本对所有用户可见，我们把它创建在/tmp目录下。

cat > /tmp/confirm777.sh <<EOFecho $USERidEOF

使用ll命令查看文件信息。

[root@localhost ~]# ll /tmp/confirm777.sh --full-time-rw-r--r--. 1 root root 13 2019-11-07 04:25:55.418254935 -0500 confirm777.sh

从ll的命令可以看出，文件的所有者是root用户，文件所属的组，也是root组，它的权限是rw-r--r--。文件权限分为三部分。

所有者权限，缩写为u。文件的所有者所拥有的权限。也就是root用户的权限，是rw-

组用户权限，缩写为g。文件所属组内所有用户的权限。因为root组内只有root一个用户，所以组用户权限是r--。

其他用户权限，缩写为o。其他不相关用户的权限，比如我们刚创建的zhang3、li4用户，对文件的权限就是r--。

全部，缩写为a，表示对上面三类用户集体操作。

那rw-这些东西是什么意思呢？

r 表示可读权限。read。

w 表示可写权限。write。

x 表示可执行权限。execute。

- 权限占位符，表示没有当前权限。

注意：一个用户拥有文件的w权限，并不代表就可以删除文件。w仅仅针对于文件内容来说的。

一个文件，有3类用户，每类用户，有3种权限。使用最简单的小学乘法，我们能够得出，一个文件的权限位，需要3x3=9个标志位表示。

我们的文件名称，叫做confirm777.sh，这个名字是随便起的么？当然不是，777在linux代表特殊的含义，它代表文件对所有用户具有可读、可写、可执行的权限。可以想象，如果每个文件都有这样的权限，系统将无安全可言。那这一串数字是怎么来的呢？可以看下面的对照表。

r 4 读

w 2 写

x 1 执行

对以上三个属性进行任意组合，可以得到：

4 r-- 4 0 0

6 rw- 4 2 0

5 r-x 4 0 1

2 -w- 0 2 0

3 -wx 0 2 1

1 --x 0 0 1

7 rwx 4 2 1

9.3、文件权限更改

下面介绍三个文件权限相关的命令。一般常用的，就是chown和chmod。

chown 更改文件的所有者。chgrp 更改文件的组。chmod 更改文件权限。

接下来，我们把confirm777.sh的所有者和组，修改成刚刚创建的用户zhang3。

cd /tmp[root@localhost tmp]# chown zhang3:zhang3 confirm777.sh[root@localhost tmp]# ll confirm777.sh-rw-r--r--. 1 zhang3 zhang3 13 Nov 7 04:25 confirm777.sh

给文件所有者增加执行权限。然后分别切换到zhang3，li4用户执行一下。

通过su 命令，可以切换到其他用户，一般使用su -进行环境变量的清理；而命令id，能够看到当前正在执行的用户信息。

[root@localhost tmp]# chmod u x confirm777.sh[root@localhost tmp]# su li4[li4@localhost tmp]$ ./confirm777.shbash: ./confirm777.sh: Permission denied[li4@localhost tmp]$ exitexit[root@localhost tmp]# su zhang3[zhang3@localhost tmp]$ ./confirm777.shrootuid=1000(zhang3) gid=1000(zhang3) groups=1000(zhang3) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

可以看到，文件所有者zhang3可以执行文件，但不相关的li4，提示没有权限。接下来，我们验证用户组相关的权限位。

# 去掉zhang3的执行权限root@localhost tmp]# chmod u-x confirm777.sh[root@localhost tmp]# ll confirm777.sh-rw-r--r--. 1 zhang3 zhang3 13 Nov 7 04:25 confirm777.sh# 增加zhang3组的执行权限，由于li4在zhang3组里，它拥有权限[root@localhost tmp]# chmod g x confirm777.sh[root@localhost tmp]# ll confirm777.sh-rw-r-xr--. 1 zhang3 zhang3 13 Nov 7 04:25 confirm777.sh# 切换到zhang3进行执行[root@localhost tmp]# su - zhang3[zhang3@localhost tmp]$ ./confirm777.shbash: ./confirm777.sh: Permission denied[zhang3@localhost tmp]$ exitexit# 切换到li4进行执行[root@localhost tmp]# su - li4[li4@localhost tmp]$ ./confirm777.shrootuid=1001(li4) gid=1001(li4) groups=1001(li4),1000(zhang3) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

从命令的执行结果可以看出。这次，li4能够执行文件，相反的，zhang3却不能。

我们使用chmod命令来修改文件权限，使用的是类似于a x这样的英文字母。拿第一个脚本来说，初始的权限是rw-r--r--，也就是644，在这种情况下，下面的两个脚本等效。

chmod u x confirm777.shchmod 744 confirm777.sh

可以看到，第二个命令，使用的是数字样式的权限位，多了一步人脑转换过程。这在日常的使用中，是非常不方便的。所以，使用符号法的表示方式，能够更加直观，非常推荐。

为了更直观的表现这个过程，我专门制作了一张图。

9.4、目录权限

这里有一个非常有意思的地方。把文件设置成可执行，可以把普通文件变成脚本，目录文件的可执行权限是什么鬼？有什么意义？对文件夹来说：

r 表示允许读取目录中的文件名，但不能进入该目录

w 表示允许用户修改目录，可以创建、迁移、删除、更名目录下的文件

x 可以获得目录下文件的列表，以及进入目录，执行cd命令

关于r和x的区别，可以看下面的命令结果，仔细感受一下它们的区别。一般的，几乎所有的目录，都拥有执行权限，不要随意对其进行设置。

[root@localhost tmp]# su - li4[li4@localhost ~]$ cd /tmp[li4@localhost tmp]$ mkdir nox[li4@localhost tmp]$ touch nox/{a,b,c,d}[li4@localhost tmp]$ chmod a-x nox[li4@localhost tmp]$ ls noxls: cannot access nox/a: Permission deniedls: cannot access nox/b: Permission deniedls: cannot access nox/c: Permission deniedls: cannot access nox/d: Permission denieda b c d[li4@localhost tmp]$ cat nox/acat: nox/a: Permission denied[li4@localhost tmp]$ chmod a x nox[li4@localhost tmp]$ chmod a-r nox[li4@localhost tmp]$ ls noxls: cannot open directory nox: Permission denied

9.5、sticky bit

接下来，我们介绍一个比较烧脑的粘贴位。

假如你要删除一个文件，你可以没有这个文件的写权限，但是你必须要拥有这个文件上级目录的写权限。如何创建一个目录，可以让任何人些人文件，但是又不能删除其他用户的文件？这就是stick bit的作用。粘贴位一般只用于目录上，对文件来说并没有什么用处。粘贴位一般使用t表示。

我们可以看一个典型的目录/tmp

[root@localhost tmp]# ls -dl /tmpdrwxrwxrwt. 9 root root 4096 Nov 7 06:27 /tmp

可以看到，最后一位，显示的是t，而不是x，意思是普通用户不能删除其他用户的文件。所有用户在/tmp目录中，都可以随意创建文件，但是却删除不了其他人的文件，即使文件的权限是777。

[root@localhost tmp]# touch /tmp/stick[root@localhost tmp]# chown li4:li4 /tmp/stick[root@localhost tmp]# chmod 777 /tmp/stick[root@localhost tmp]# su - zhang3[zhang3@localhost ~]$ rm /tmp/stickrm: cannot remove ‘/tmp/stick’: Operation not permitted

我们在上面创建了两个用户zhang3和li4，并拿它们测试了chown和chmod命令，最后介绍了粘贴位。linux比较安全的原因，就是因为有比较详尽的权限划分。但权限是枚双刃剑，超权用户一个命令就可以搞垮系统，许多隐藏的木马，通过提权运行在不为人知的地方。

权限相关的几个命令会经常被使用，下面举几个例子。

# 设置/var/lib/mysql的用户和组为mysqlchown -R mysql:mysql /var/lib/mysql# 设置目录可读可写，能够上传文件chmod 777 /var/www/uploads# 增加本目录下所有sh的执行权限chomd a x *.sh# 变更file为可读可写可执行chmod u=rwx,g=rwx,o=rwx file

下面依然是思考时间：

1、下面这个命令，执行以后，会发生什么情况？警告：不要执行，哪怕把000改成其他数字。

# R遍历子目录的意思chmod -R 000 /

2、有一天，我看到一个命令chmod u s file，文中并没有介绍s是什么意思，这是什么意思？

3、如何删除一个用户的组？

10. 如何对磁盘进行操作？

下面的场景非常的恐怖，对有些程序员来说可以是一场噩梦。

一大早刚刚去上班，煎饼果子刚啃了一半，几个全副武装的警察就闯进了公司。二话不说控制住了工作人员，并守株待兔的等着鱼儿来上班。

原因就是：公司涉嫌存储和扩散非法文件，需要查封所有的服务器进行彻查。

这些文件，有的简单的放在磁盘上，有的放在文件存储服务器上，有的，被切成了多片放在了不同的廉价机器上。

接下来会发生什么，要看技术人员的水平，但估计结果并不会太好。

在上一小节，我们创建了两个普通用户，这两个用户没什么本事，和默认的用户root比起来，它们的权限就小得多。除了自己目录下的文件，其他的，它几乎都没有权限去修改。

这些文件，肯定是要存在磁盘上的。对磁盘的管理，有非常多的命令，这一小节的内容，对于系统管理员来说，经常使用；但对于开发来说，就要求比较低一些。因为开发只需要知道小电影存在什么地方了，不需要知道小电影是怎么存的。

那定罪的时候，运维和程序员，到底是谁的锅更大一些？其实是个悖论。运维人员在发呆的时候，脑子里回忆起了下面的知识。

10.1.添加新硬盘

你要是一个系统管理员，甚至是一个上了云的系统管理员，现在买了一块aws的扩展盘，它是不能被使用的。需要经过格式化挂载以后，才能投入生产。

还记得在安装系统的时候么？其中有一步，需要对虚拟机的磁盘，进行划分，我们直接采用默认的方式。不过现在已经改不了了，它已经是过去式了。

为了模拟对磁盘的管理，我们需要首先给虚拟机新加一块虚拟磁盘。首先，使用shutdown -h now命令关闭机器，进行下面的操作。

1、进入settings选项，然后切换到storage，添加磁盘

2、点击创建一块磁盘

3、选择VDI

4、动态扩容，用多少扩多少

5、我们创建一块2GB大的，叫做disk2的磁盘

启动机器。远程连接192的ip，别忘了执行dhclient命令。

首先使用fdisk看一下目前的磁盘状况。

root@localhost ~]# fdisk -lDisk /dev/sda: 8589 MB, 8589934592 bytes, 16777216 sectorsUnits = sectors of 1 * 512 = 512 bytesSector size (logical/physical): 512 bytes / 512 bytesI/O size (minimum/optimal): 512 bytes / 512 bytesDisk label type: dosDisk identifier: 0x000c2410 Device Boot Start End Blocks Id System/dev/sda1 * 2048 2099199 1048576 83 Linux/dev/sda2 2099200 16777215 7339008 8e Linux LVMDisk /dev/sdb: 2147 MB, 2147483648 bytes, 4194304 sectorsUnits = sectors of 1 * 512 = 512 bytesSector size (logical/physical): 512 bytes / 512 bytesI/O size (minimum/optimal): 512 bytes / 512 bytesDisk /dev/mapper/centos-root: 6652 MB, 6652166144 bytes, 12992512 sectorsUnits = sectors of 1 * 512 = 512 bytesSector size (logical/physical): 512 bytes / 512 bytesI/O size (minimum/optimal): 512 bytes / 512 bytesDisk /dev/mapper/centos-swap: 859 MB, 859832320 bytes, 1679360 sectorsUnits = sectors of 1 * 512 = 512 bytesSector size (logical/physical): 512 bytes / 512 bytesI/O size (minimum/optimal): 512 bytes / 512 bytes

从命令的结果，我们看到了有两块磁盘。/dev/sda和/dev/sdb，其中sda已经被分配，已经被我们的文件系统所占用。现在，回忆一下/dev目录的用途，在这个目录下，存放了一些设备文件，假如我们再添加一块磁盘，它的句柄就应该是sdc (sd*)。

在这整块磁盘能够被使用之前，我们需要对它进行三次操作。

磁盘分区

磁盘格式化

磁盘挂载

10.2.分区

对磁盘分区依然是fdisk命令，以下命令，将进入交互模式。在交互模式中，输入n新建分区。由于我们的磁盘只有2GB，所以只创建一个分区就好。根据向导，一路确定向下，最后，输入w确定写入分区表，同时退出命令交互。

再次执行fdisk -l，可以看到已经多了一块2gb大小的分区。

[root@localhost ~]# fdisk /dev/sdb...[root@localhost ~]# fdisk -l... Device Boot Start End Blocks Id System/dev/sdb1 2048 4194303 2096128 83 Linux...

10.3.格式化

在命令行，输入mkfs，然后按<TAB>进行补全，将会显示一批命令。

[root@localhost ~]# mkfs.mkfs.btrfs mkfs.cramfs mkfs.ext2 mkfs.ext3 mkfs.ext4 mkfs.minix mkfs.xfs

这批命令，都可以对磁盘进行格式化。目前，最常用的磁盘格式是ext4。但我们并没有找到windows操作系统的FAT以及NTFS等格式，但它们在概念上是等同的。

下面介绍一下Linux下常用的磁盘格式。

btrfs GPL授权。是为了替换ext系统而发起的。不熟悉，不过多评价。

cramfs 门针对闪存设计的只读压缩的文件系统，其容量上限为256M,采用zlib压缩，很少用

ext2 ext的早先版本。

ext3 ext2的改进。

ext4 使用最多。如果对其他的不熟悉，老老实实用ext4吧。

minix 比较古老，也不常用。

xfs XFS 文件系统是扩展文件系统的一个扩展，是 64 位高性能日志文件系统。centos7.0开始的默认文件系统。

我们就录乡随俗，将磁盘给格式化成xfs。

[root@localhost ~]# mkfs.xfs /dev/sdb1

注意：如果想要把磁盘格式化成fat32的格式，需要安装一个软件。

yum install dosfstools -y

10.4.挂载

最后一步，是使用mount命令挂载磁盘。我们把它挂载到/data目录。

df命令能够看到系统的磁盘使用状况，参数h是human的意思，以比较容易读的方式展现信息；lsblk则以另一个角度查看系统磁盘挂载情况。

[root@localhost ~]# mkdir /data[root@localhost ~]# mount /dev/sdb1 /data[root@localhost ~]# df -hFilesystem Size Used Avail Use% Mounted ondevtmpfs 908M 0 908M 0% /devtmpfs 920M 0 920M 0% /dev/shmtmpfs 920M 8.6M 911M 1% /runtmpfs 920M 0 920M 0% /sys/fs/cgroup/dev/mapper/centos-root 6.2G 1.4G 4.9G 22% //dev/sda1 1014M 149M 866M 15% /boottmpfs 184M 0 184M 0% /run/user/0/dev/sdb1 2.0G 33M 2.0G 2% /dataroot@localhost ~]# lsblk -fNAME FSTYPE LABEL UUID MOUNTPOINTsda├─sda1 xfs ac3a3ce8-6ab1-4c0b-91c8-b4e617f0dfb4 /boot└─sda2 LVM2_member 3GzmOd-TUc1-p2ZN-wT5q-ttky-to9l-PF495o ├─centos-root xfs 9f86f663-060a-4450-90f9-3005ad9c5d92 / └─centos-swap swap cf8709b0-b0ab-4d44-a23e-ad76f85efad6 [SWAP]sdb└─sdb1 xfs 0a7c861c-1a31-45b3-bf37-c72229f35704 /data

为了能够在开机的时候加载磁盘，我们需要修改/etc/fstab文件。这种文件修改的时候一定要小心，否则会造成系统无法启动。

[root@localhost ~]# echo "/dev/sdb1 xfs defaults 0 0" >> /etc/fstab[root@localhost ~]# cat /etc/fstab/dev/mapper/centos-root / xfs defaults 0 0UUID=ac3a3ce8-6ab1-4c0b-91c8-b4e617f0dfb4 /boot xfs defaults 0 0/dev/mapper/centos-swap swap swap defaults 0 0/dev/sdb1 xfs defaults 0 0

10.5.交换分区

由于内存的容量有限，现在的操作系统，都会使用磁盘模拟一个虚拟内存区域，用于缓冲一些数据。由于磁盘的速度和内存不可同日而语，通常会造成应用程序的卡顿。卡归卡，应用进程却可以因此苟延残喘，续命。

swap分区，即交换区，系统在物理内存不够时，与swap进行交换。即当系统的物理内存不够用时，把硬盘中一部分空间释放出来，以供当前运行的程序使用。当那些程序要运行时，再从swap分区中恢复保存的数据到内存中。

现代互联网公司，一般对接口的响应时间有比较高的要求，swap分区一般是禁用的。关于swap，有下面几个相关的命令。

# 制作交换分区[root@localhost ~]# mkswap /dev/sdb1# 禁用所有交换分区[root@localhost ~]# swapoff -a# 启用交换分区[root@localhost ~]# swapon

10.6 使用dd命令进行备份

下面的命令，将直接备份磁盘A到磁盘B。

# dd if=/dev/sda of=/dev/sdb

在上面的命令中，if代表输入的文件，of代表输出的文件。根据Linux下一切皆文件的原理，这里的文件指的就是设备。

dd命令还可以将整个磁盘打包成一个镜像文件。比如下面的命令。

# dd if=/dev/hda of=~/hdadisk.img

当然，恢复磁盘的时候，也是相当简单的，我们只需要逆向操作就可以了。

# dd if=hdadisk.img of=/dev/hda

End

这篇文章有6万字，经历了多个版本的整理，有小伙伴已经拿着它作为了公司的培训资料。到现在为止，你已经对Linux的命令行有了比较客观的了解。但我这里还有很多可以让你更上一层楼的文章，比如vim、sed、awk的使用。

原文出处：https://mp.weixin.qq.com/s/ZralWEfG2WJfZ-G-x9biow

82天突破1000star，项目团队梳理出软件开源必须注意的8个方面

近期，我们在GitHub上开源了微服务任务调度框架SIA-TASK，82天，收获了1000 个star！由于这是SIA团队第一次开源项目，开源的相关工作，团队之前并没有太多的经验，因此我们特别整理了本次开源的各种记录事项，希望给今后开源的项目做参考。

关键步骤

开发

协议

安全扫描

文档

版本号

开源

后期

迭代

下面我们逐个步骤进行阐述。

一、开发

在开源项目的开发过程中要注意以下几点：

首先，要给自己的项目取一个合适的名字，取名规则这里不再赘述，需要强调的一点是：项目名称不能与GitHub上已开源过的项目名称相同。

其次，选择合适的编程语言。

再次，编码过程中要注意代码的规范。

最后要说的就是开源协议的选择了，目前最流行的开源协议有以下六种：GPL、BSD、MIT、Mozilla、Apache和LGPL。

不同的开源协议之间的差别还是挺大的，具体如何选择，可以参考一张图看懂开源协议（https://blog.csdn.net/cwt19902010/article/details/53736746），如果这些常用的开源协议都不适合你的项目，你也可以自己写一个自己的开源协议。

为了更方便查看开源协议选择图，参考图如下

以Apache License Version 2.0协议为例,比较常用协议与Apache协议冲突情况，冲突图如下：

二、协议

项目开发完成之后，需要梳理出项目中使用到的协议（包含项目引用的组件中用到的协议），此处推荐使用maven许可证插件。插件配置参见License Maven plugin（https://www.mojohaus.org/license-maven-plugin/），maven许可证插件在主pom中配置示例如下（此处开源协议采用Apache 2.0）

<licenses> <license> <name>Apache License, Version 2.0</name> <url>http://www.apache.org/licenses/LICENSE-2.0.html</url> <distribution>repo</distribution> </license> </licenses> <plugins> <plugin> <groupId>org.codehaus.mojo</groupId> <artifactId>license-maven-plugin</artifactId> <version>1.13</version> <configuration>  <outputDirectory>${main.basedir}</outputDirectory> <thirdPartyFilename>LICENSE-3RD-PARTY</thirdPartyFilename> <fileTemplate>/org/codehaus/mojo/license/third-party-file-groupByLicense.ftl</fileTemplate> <useMissingFile>true</useMissingFile> <missingFile>${main.basedir}/LICENSE-3RD-PARTY.properties</missingFile> <aggregateMissingLicensesFile>${main.basedir}/LICENSE-3RD-PARTY.properties</aggregateMissingLicensesFile> <licenseMerges> <licenseMerge>Apache 2.0|ASL, version 2|http://www.apache.org/licenses/LICENSE-2.0.txt|http://asm.ow2.org/license.html|The Apache License, Version 2.0|Apache License|Apache License Version 2|Apache License Version 2.0|Apache Software License - Version 2.0|Apache 2.0 License|Apache License 2.0|ASL|Apache 2|Apache-2.0|the Apache License, ASL Version 2.0|The Apache Software License, Version 2.0|Apache License, Version 2.0|Apache Public License 2.0</licenseMerge> <licenseMerge>BSD|The BSD 3-Clause License|The BSD License|Modified BSD License|New BSD License|New BSD license|Two-clause BSD-style license|BSD licence|BSD New|The New BSD License|BSD 3-Clause|BSD 3-clause</licenseMerge> <licenseMerge>MIT|MIT License|The MIT License</licenseMerge> <licenseMerge>LGPL|LGPL, version 2.1|GNU Library or Lesser General Public License (LGPL) V2.1|GNU Lesser General Public License (LGPL), Version 2.1|GNU Lesser General Public License, Version 2.1|LGPL 2.1</licenseMerge> <licenseMerge>CDDL|CDDL GPL|CDDL GPL License|CDDL GPLv2 with classpath exception|CDDL License|CDDL 1.0|CDDL 1.1|COMMON DEVELOPMENT AND DISTRIBUTION LICENSE (CDDL) Version 1.0|Common Development and Distribution License (CDDL) v1.0</licenseMerge> <licenseMerge>EPL|Eclipse Public License - Version 1.0</licenseMerge> <licenseMerge>GPL|GPL2 w/ CPE|GPLv2 CE|GNU General Public Library</licenseMerge> <licenseMerge>MPL|MPL 1.1</licenseMerge> <licenseMerge>Public Domain</licenseMerge> <licenseMerge>Common Public License|Common Public License Version 1.0</licenseMerge> <licenseMerge>CC0|CC0 1.0 Universal|Public Domain, per Creative Commons CC0</licenseMerge> <licenseMerge>Unknown License|Unknown license</licenseMerge> </licenseMerges>  <aggregateDownloadLicenses.executeOnlyOnRootModule>true</aggregateDownloadLicenses.executeOnlyOnRootModule>  <licensesOutputFile>${main.basedir}/lic/licenses.xml</licensesOutputFile> <licensesOutputDirectory>${main.basedir}/lic/licenses/</licensesOutputDirectory>  <licenseName>apache_v2</licenseName> <inceptionYear>2019</inceptionYear> <organizationName>sia</organizationName> <projectName>task</projectName> <roots> <root>src/main/java</root> <root>src/test/java</root> </roots> <includes> <include>**/*.java</include> <include>**/*.xml</include> <include>**/*.sh</include> <include>**/*.py</include> <include>**/*.properties</include> <include>**/*.sql</include> <include>**/*.html</include> <include>**/*.less</include> <include>**/*.css</include> <include>**/*.js</include> <include>**/*.json</include> </includes> <canUpdateCopyright>true</canUpdateCopyright> <canUpdateDescription>true</canUpdateDescription> <addJavaLicenseAfterPackage>false</addJavaLicenseAfterPackage> <emptyLineAfterHeader>true</emptyLineAfterHeader> <processStartTag><<</processStartTag> <processEndTag>>></processEndTag> <sectionDelimiter>==</sectionDelimiter>  <licenseFile>${main.basedir}/LICENSE</licenseFile> </configuration> </plugin> <plugin> <groupId>org.jasig.maven</groupId> <artifactId>maven-notice-plugin</artifactId> <version>1.0.6.1</version> <configuration> <generateChildNotices>false</generateChildNotices> <noticeTemplate>https://source.jasig.org/licenses/NOTICE.template</noticeTemplate> <licenseMapping> <param>https://source.jasig.org/licenses/license-mappings.xml</param> </licenseMapping> </configuration> </plugin> </plugins>

配置完成之后，执行如下命令即可生成相应的协议到对应的文件，命令如下：

#### Updates (or creates) the main project license file according to the license defined in the licenseName parameter.`mvn license:update-project-license`#### Generates a file containing a list of all dependencies and their licenses for a multi-module build.`mvn license:aggregate-add-third-party`#### Downloads the license files associated with each dependency for a multi-modules build.`mvn license:aggregate-download-licenses`#### Generate NOTICE?`mvn notice:generate`

项目开源时，需要在源文件的顶部添加一个保护许可，修改、检查、删除源文件头部保护许可命令如下：

#### how to generate/update source code header?## Updates the license header of the current project source files.mvn license:update-file-header## Checks the license header of the current project source files.mvn license:check-file-header## Remove any license header of the current project source files.mvn license:remove-file-header

执行完上述命令之后，会生成几个协议文件，其中有两个关键的文件：

LICENSE文件：存放当前开源项目中用到的开源协议信息。 </br>LICENSE-3RD-PARTY文件：组件使用到的协议。</br>

在LICENSE-3RD-PARTY文件中查看组件使用的协议，参考前面介绍的各协议冲突情况，查看看组件中用到的协议是否与当前开源项目选择的开源协议有冲突，如果有冲突，需要替换掉协议冲突的接口。

三、安全扫描

安全扫描是项目开源流程中必不可少的一步，安全扫描关注的点主要有以下几个：

组件层面安全问题。比如：组件是否存在远程代码执行风险、XXE风险等。

代码层面安全问题。比如：RequestMapping上请求未限制方法等。

公司敏感信息是否外泄。比如：数据库连接信息、邮箱信息等被暴露。

备注：安全扫描工作由安全部·安全服务团队的同事负责完成，项目开发完成之后，可联系安全服务团队的同事进行代码安全扫描工作。

四、文档

README文档相当于开源项目的一个门面，如果README文档写得好，能够让用户更了解开源项目的功能，减少用户的使用成本。可以说README文档写得好的开源项目不一定是好的开源项目，但是好的开源项目的README文档写得一定好。

下面简单介绍下README文档的编写规范。综合GitHub上很多大型开源项目的README文档，个人认为READEME文档主要由以下几部分组成：

1）项目介绍

项目介绍是为了让别人快速了解项目。主要内容包括项目背景、项目简介。

2）项目架构

项目架构主要介绍项目的实现方式，可以让用户更了解项目的实现原理。

3）项目集成方式

项目集成方式即项目开发指南，可以列出项目的部署方式，或者是jar包的使用方式。

4）项目使用指南

项目使用指南也就是告诉用户怎么使用项目。最好是附上每一步的使用截图信息，这样能减少后期跟用户之间的沟通成本。

5）版本说明

此处需要告诉用户使用哪个版本更稳定。

6）版权说明

版权信息可以用于作者的维权，保护作者版本信息的合法权益。

7）项目交流方式

项目交流方式部分可以留下开源作者或者是组织的微信、微博、邮箱等联系方式，方便用户与开源作者进一步技术沟通。

五、版本

GitHub上开源的项目需要有个版本号，版本格式为：主版本号.次版本号.修订号，版本号递增规则如下：

主版本号：当你做了不兼容的 API 修改；

次版本号：当你做了向下兼容的功能性新增；

修订号：当你做了向下兼容的问题修正。

先行版本号及版本编译元数据可以加到“主版本号.次版本号.修订号”的后面，作为延伸。

更形象的解释如下：标准的版本号必须采用 X.Y.Z 的格式，其中 X、Y 和 Z 为非负的整数，且禁止在数字前方补零。X 是主版本号、Y 是次版本号、而 Z 为修订号。每个元素必须以数值来递增。例如：1.9.1 -> 1.10.0 -> 1.11.0。

备注：开源版本规范引自GitHub命名规范：语义化版本2.0.0：https://semver.org/lang/zh-CN/

六、开源

做完上述几步的工作之后，我们就可以把项目上传到GitHub上进行开源了。GitHub的使用网上有很多文章介绍，这里不再赘述，可以参考在GitHub上参与开源项目日常流程：https://blog.csdn.net/five3/article/details/9307041

七、后期

开源后期维护服务是开源项目时最容易被忽视的，为了让用户更好地使用开源项目，我们可以通过GitHub issue、微信答疑群、论坛、社区文章分享等互动形式做好开源后期服务工作。

八、迭代

GitHub上迭代开发流程如下：项目owner给项目开发者设置member权限，member用户fork开源项目的资源成自己的资源，然后修改fork之后的资源，修改完成之后，提merge请求，只有项目owner才有权限merge。如何同步fork项目可参见如下文章如何同步fork项目：https://blog.csdn.net/t111t/article/details/45894381

开源项目：

微服务任务调度框架：https://github.com/siaorg/sia-task

微服务路由网关：https://github.com/siaorg/sia-gateway

作者: 张丽君

来源：宜信技术学院

PHP根据抖音的分享链接来抓包抖音视频

现在抖音是个很火的短视频平台，上面有许多不错的小视频。今天教大家怎么用PHP技术来获取到抖音上的的内容。

1：打开抖音选中你认为好的视频点击分享，复制链接，然后你会获取到如下的内容：

　　#科比愿你去的地方也有篮球陪伴，也能披着24号紫金战衣！ #动态壁纸 https://v.douyin.com/36xkCS/ 复制此链接，打开【抖音短视频】，直接观看视频！

这段内容就是我们进行抓包使用的路径。

2：需要使用到php解析html类库文件simple_html_dom

　　创建 simple_html_dom.php 代码如下：

1 <?php 2 /** 3 * Website: http://sourceforge.net/projects/simplehtmldom/ 4 * Acknowledge: Jose Solorzano (https://sourceforge.net/projects/php-html/) 5 * Contributions by: 6 * Yousuke Kumakura (Attribute filters) 7 * Vadim Voituk (Negative indexes supports of "find" method) 8 * Antcs (Constructor with automatically load contents either text or file/url) 9 * 10 * all affected sections have comments starting with "PaperG" 11 * 12 * Paperg - Added case insensitive testing of the value of the selector. 13 * Paperg - Added tag_start for the starting index of tags - NOTE: This works but not accurately. 14 * This tag_start gets counted AFTER rn have been crushed out, and after the remove_noice calls so it will not reflect the REAL position of the tag in the source, 15 * it will almost always be smaller by some amount. 16 * We use this to determine how far into the file the tag in question is. This "percentage will never be accurate as the $dom->size is the "real" number of bytes the dom was created from. 17 * but for most purposes, it's a really good estimation. 18 * Paperg - Added the forceTagsClosed to the dom constructor. Forcing tags closed is great for malformed html, but it CAN lead to parsing errors. 19 * Allow the user to tell us how much they trust the html. 20 * Paperg add the text and plaintext to the selectors for the find syntax. plaintext implies text in the innertext of a node. text implies that the tag is a text node. 21 * This allows for us to find tags based on the text they contain. 22 * Create find_ancestor_tag to see if a tag is - at any level - inside of another specific tag. 23 * Paperg: added parse_charset so that we know about the character set of the source document. 24 * NOTE: If the user's system has a routine called get_last_retrieve_url_contents_content_type availalbe, we will assume it's returning the content-type header from the 25 * last transfer or curl_exec, and we will parse that and use it in preference to any other method of charset detection. 26 * 27 * Found infinite loop in the case of broken html in restore_noise. Rewrote to protect from that. 28 * PaperG (John Schlick) Added get_display_size for "IMG" tags. 29 * 30 * Licensed under The MIT License 31 * Redistributions of files must retain the above copyright notice. 32 * 33 * @author S.C. Chen <me578022@gmail.com> 34 * @author John Schlick 35 * @author Rus Carroll 36 * @version 1.5 ($Rev: 196 $) 37 * @package PlaceLocalInclude 38 * @subpackage simple_html_dom 39 */ 40 41 /** 42 * All of the Defines for the classes below. 43 * @author S.C. Chen <me578022@gmail.com> 44 */ 45 define('HDOM_TYPE_ELEMENT', 1); 46 define('HDOM_TYPE_COMMENT', 2); 47 define('HDOM_TYPE_TEXT', 3); 48 define('HDOM_TYPE_ENDTAG', 4); 49 define('HDOM_TYPE_ROOT', 5); 50 define('HDOM_TYPE_UNKNOWN', 6); 51 define('HDOM_QUOTE_DOUBLE', 0); 52 define('HDOM_QUOTE_SINGLE', 1); 53 define('HDOM_QUOTE_NO', 3); 54 define('HDOM_INFO_BEGIN', 0); 55 define('HDOM_INFO_END', 1); 56 define('HDOM_INFO_QUOTE', 2); 57 define('HDOM_INFO_SPACE', 3); 58 define('HDOM_INFO_TEXT', 4); 59 define('HDOM_INFO_INNER', 5); 60 define('HDOM_INFO_OUTER', 6); 61 define('HDOM_INFO_ENDSPACE',7); 62 define('DEFAULT_TARGET_CHARSET', 'UTF-8'); 63 define('DEFAULT_BR_TEXT', "rn"); 64 define('DEFAULT_SPAN_TEXT', " "); 65 define('MAX_FILE_SIZE', 600000); 66 // helper functions 67 // ----------------------------------------------------------------------------- 68 // get html dom from file 69 // $maxlen is defined in the code as PHP_STREAM_COPY_ALL which is defined as -1. 70 function file_get_html($url, $use_include_path = false, $context=null, $offset = -1, $maxLen=-1, $lowercase = true, $forceTagsClosed=true, $target_charset = DEFAULT_TARGET_CHARSET, $stripRN=true, $defaultBRText=DEFAULT_BR_TEXT, $defaultSpanText=DEFAULT_SPAN_TEXT) 71 { 72 // We DO force the tags to be terminated. 73 $dom = new simple_html_dom(null, $lowercase, $forceTagsClosed, $target_charset, $stripRN, $defaultBRText, $defaultSpanText); 74 // For sourceforge users: uncomment the next line and comment the retreive_url_contents line 2 lines down if it is not already done. 75 $contents = file_get_contents($url, $use_include_path, $context, $offset); 76 // Paperg - use our own mechanism for getting the contents as we want to control the timeout. 77 //$contents = retrieve_url_contents($url); 78 if (empty($contents) || strlen($contents) > MAX_FILE_SIZE) 79 { 80 return false; 81 } 82 // The second parameter can force the selectors to all be lowercase. 83 $dom->load($contents, $lowercase, $stripRN); 84 return $dom; 85 } 86 87 // get html dom from string 88 function str_get_html($str, $lowercase=true, $forceTagsClosed=true, $target_charset = DEFAULT_TARGET_CHARSET, $stripRN=true, $defaultBRText=DEFAULT_BR_TEXT, $defaultSpanText=DEFAULT_SPAN_TEXT) 89 { 90 $dom = new simple_html_dom(null, $lowercase, $forceTagsClosed, $target_charset, $stripRN, $defaultBRText, $defaultSpanText); 91 if (empty($str) || strlen($str) > MAX_FILE_SIZE) 92 { 93 $dom->clear(); 94 return false; 95 } 96 $dom->load($str, $lowercase, $stripRN); 97 return $dom; 98 } 99 100 // dump html dom tree 101 function dump_html_tree($node, $show_attr=true, $deep=0) 102 { 103 $node->dump($node); 104 } 105 106 107 /** 108 * simple html dom node 109 * PaperG - added ability for "find" routine to lowercase the value of the selector. 110 * PaperG - added $tag_start to track the start position of the tag in the total byte index 111 * 112 * @package PlaceLocalInclude 113 */ 114 class simple_html_dom_node 115 { 116 public $nodetype = HDOM_TYPE_TEXT; 117 public $tag = 'text'; 118 public $attr = array(); 119 public $children = array(); 120 public $nodes = array(); 121 public $parent = null; 122 // The "info" array - see HDOM_INFO_... for what each element contains. 123 public $_ = array(); 124 public $tag_start = 0; 125 private $dom = null; 126 127 function __construct($dom) 128 { 129 $this->dom = $dom; 130 $dom->nodes[] = $this; 131 } 132 133 function __destruct() 134 { 135 $this->clear(); 136 } 137 138 function __toString() 139 { 140 return $this->outertext(); 141 } 142 143 // clean up memory due to php5 circular references memory leak... 144 function clear() 145 { 146 $this->dom = null; 147 $this->nodes = null; 148 $this->parent = null; 149 $this->children = null; 150 } 151 152 // dump node's tree 153 function dump($show_attr=true, $deep=0) 154 { 155 $lead = str_repeat(' ', $deep); 156 157 echo $lead.$this->tag; 158 if ($show_attr && count($this->attr)>0) 159 { 160 echo '('; 161 foreach ($this->attr as $k=>$v) 162 echo "[$k]=>"".$this->$k.'", '; 163 echo ')'; 164 } 165 echo "n"; 166 167 if ($this->nodes) 168 { 169 foreach ($this->nodes as $c) 170 { 171 $c->dump($show_attr, $deep 1); 172 } 173 } 174 } 175 176 177 // Debugging function to dump a single dom node with a bunch of information about it. 178 function dump_node($echo=true) 179 { 180 181 $string = $this->tag; 182 if (count($this->attr)>0) 183 { 184 $string .= '('; 185 foreach ($this->attr as $k=>$v) 186 { 187 $string .= "[$k]=>"".$this->$k.'", '; 188 } 189 $string .= ')'; 190 } 191 if (count($this->_)>0) 192 { 193 $string .= ' $_ ('; 194 foreach ($this->_ as $k=>$v) 195 { 196 if (is_array($v)) 197 { 198 $string .= "[$k]=>("; 199 foreach ($v as $k2=>$v2) 200 { 201 $string .= "[$k2]=>"".$v2.'", '; 202 } 203 $string .= ")"; 204 } else { 205 $string .= "[$k]=>"".$v.'", '; 206 } 207 } 208 $string .= ")"; 209 } 210 211 if (isset($this->text)) 212 { 213 $string .= " text: (" . $this->text . ")"; 214 } 215 216 $string .= " HDOM_INNER_INFO: '"; 217 if (isset($node->_[HDOM_INFO_INNER])) 218 { 219 $string .= $node->_[HDOM_INFO_INNER] . "'"; 220 } 221 else 222 { 223 $string .= ' NULL '; 224 } 225 226 $string .= " children: " . count($this->children); 227 $string .= " nodes: " . count($this->nodes); 228 $string .= " tag_start: " . $this->tag_start; 229 $string .= "n"; 230 231 if ($echo) 232 { 233 echo $string; 234 return; 235 } 236 else 237 { 238 return $string; 239 } 240 } 241 242 // returns the parent of node 243 // If a node is passed in, it will reset the parent of the current node to that one. 244 function parent($parent=null) 245 { 246 // I am SURE that this doesn't work properly. 247 // It fails to unset the current node from it's current parents nodes or children list first. 248 if ($parent !== null) 249 { 250 $this->parent = $parent; 251 $this->parent->nodes[] = $this; 252 $this->parent->children[] = $this; 253 } 254 255 return $this->parent; 256 } 257 258 // verify that node has children 259 function has_child() 260 { 261 return !empty($this->children); 262 } 263 264 // returns children of node 265 function children($idx=-1) 266 { 267 if ($idx===-1) 268 { 269 return $this->children; 270 } 271 if (isset($this->children[$idx])) return $this->children[$idx]; 272 return null; 273 } 274 275 // returns the first child of node 276 function first_child() 277 { 278 if (count($this->children)>0) 279 { 280 return $this->children[0]; 281 } 282 return null; 283 } 284 285 // returns the last child of node 286 function last_child() 287 { 288 if (($count=count($this->children))>0) 289 { 290 return $this->children[$count-1]; 291 } 292 return null; 293 } 294 295 // returns the next sibling of node 296 function next_sibling() 297 { 298 if ($this->parent===null) 299 { 300 return null; 301 } 302 303 $idx = 0; 304 $count = count($this->parent->children); 305 while ($idx<$count && $this!==$this->parent->children[$idx]) 306 { 307 $idx; 308 } 309 if ( $idx>=$count) 310 { 311 return null; 312 } 313 return $this->parent->children[$idx]; 314 } 315 316 // returns the previous sibling of node 317 function prev_sibling() 318 { 319 if ($this->parent===null) return null; 320 $idx = 0; 321 $count = count($this->parent->children); 322 while ($idx<$count && $this!==$this->parent->children[$idx]) 323 $idx; 324 if (--$idx<0) return null; 325 return $this->parent->children[$idx]; 326 } 327 328 // function to locate a specific ancestor tag in the path to the root. 329 function find_ancestor_tag($tag) 330 { 331 global $debugObject; 332 if (is_object($debugObject)) { $debugObject->debugLogEntry(1); } 333 334 // Start by including ourselves in the comparison. 335 $returnDom = $this; 336 337 while (!is_null($returnDom)) 338 { 339 if (is_object($debugObject)) { $debugObject->debugLog(2, "Current tag is: " . $returnDom->tag); } 340 341 if ($returnDom->tag == $tag) 342 { 343 break; 344 } 345 $returnDom = $returnDom->parent; 346 } 347 return $returnDom; 348 } 349 350 // get dom node's inner html 351 function innertext() 352 { 353 if (isset($this->_[HDOM_INFO_INNER])) return $this->_[HDOM_INFO_INNER]; 354 if (isset($this->_[HDOM_INFO_TEXT])) return $this->dom->restore_noise($this->_[HDOM_INFO_TEXT]); 355 356 $ret = ''; 357 foreach ($this->nodes as $n) 358 $ret .= $n->outertext(); 359 return $ret; 360 } 361 362 // get dom node's outer text (with tag) 363 function outertext() 364 { 365 global $debugObject; 366 if (is_object($debugObject)) 367 { 368 $text = ''; 369 if ($this->tag == 'text') 370 { 371 if (!empty($this->text)) 372 { 373 $text = " with text: " . $this->text; 374 } 375 } 376 $debugObject->debugLog(1, 'Innertext of tag: ' . $this->tag . $text); 377 } 378 379 if ($this->tag==='root') return $this->innertext(); 380 381 // trigger callback 382 if ($this->dom && $this->dom->callback!==null) 383 { 384 call_user_func_array($this->dom->callback, array($this)); 385 } 386 387 if (isset($this->_[HDOM_INFO_OUTER])) return $this->_[HDOM_INFO_OUTER]; 388 if (isset($this->_[HDOM_INFO_TEXT])) return $this->dom->restore_noise($this->_[HDOM_INFO_TEXT]); 389 390 // render begin tag 391 if ($this->dom && $this->dom->nodes[$this->_[HDOM_INFO_BEGIN]]) 392 { 393 $ret = $this->dom->nodes[$this->_[HDOM_INFO_BEGIN]]->makeup(); 394 } else { 395 $ret = ""; 396 } 397 398 // render inner text 399 if (isset($this->_[HDOM_INFO_INNER])) 400 { 401 // If it's a br tag... don't return the HDOM_INNER_INFO that we may or may not have added. 402 if ($this->tag != "br") 403 { 404 $ret .= $this->_[HDOM_INFO_INNER]; 405 } 406 } else { 407 if ($this->nodes) 408 { 409 foreach ($this->nodes as $n) 410 { 411 $ret .= $this->convert_text($n->outertext()); 412 } 413 } 414 } 415 416 // render end tag 417 if (isset($this->_[HDOM_INFO_END]) && $this->_[HDOM_INFO_END]!=0) 418 $ret .= '</'.$this->tag.'>'; 419 return $ret; 420 } 421 422 // get dom node's plain text 423 function text() 424 { 425 if (isset($this->_[HDOM_INFO_INNER])) return $this->_[HDOM_INFO_INNER]; 426 switch ($this->nodetype) 427 { 428 case HDOM_TYPE_TEXT: return $this->dom->restore_noise($this->_[HDOM_INFO_TEXT]); 429 case HDOM_TYPE_COMMENT: return ''; 430 case HDOM_TYPE_UNKNOWN: return ''; 431 } 432 if (strcasecmp($this->tag, 'script')===0) return ''; 433 if (strcasecmp($this->tag, 'style')===0) return ''; 434 435 $ret = ''; 436 // In rare cases, (always node type 1 or HDOM_TYPE_ELEMENT - observed for some span tags, and some p tags) $this->nodes is set to NULL. 437 // NOTE: This indicates that there is a problem where it's set to NULL without a clear happening. 438 // WHY is this happening? 439 if (!is_null($this->nodes)) 440 { 441 foreach ($this->nodes as $n) 442 { 443 $ret .= $this->convert_text($n->text()); 444 } 445 446 // If this node is a span... add a space at the end of it so multiple spans don't run into each other. This is plaintext after all. 447 if ($this->tag == "span") 448 { 449 $ret .= $this->dom->default_span_text; 450 } 451 452 453 } 454 return $ret; 455 } 456 457 function xmltext() 458 { 459 $ret = $this->innertext(); 460 $ret = str_ireplace('<![CDATA[', '', $ret); 461 $ret = str_replace(']]>', '', $ret); 462 return $ret; 463 } 464 465 // build node's text with tag 466 function makeup() 467 { 468 // text, comment, unknown 469 if (isset($this->_[HDOM_INFO_TEXT])) return $this->dom->restore_noise($this->_[HDOM_INFO_TEXT]); 470 471 $ret = '<'.$this->tag; 472 $i = -1; 473 474 foreach ($this->attr as $key=>$val) 475 { 476 $i; 477 478 // skip removed attribute 479 if ($val===null || $val===false) 480 continue; 481 482 $ret .= $this->_[HDOM_INFO_SPACE][$i][0]; 483 //no value attr: nowrap, checked selected... 484 if ($val===true) 485 $ret .= $key; 486 else { 487 switch ($this->_[HDOM_INFO_QUOTE][$i]) 488 { 489 case HDOM_QUOTE_DOUBLE: $quote = '"'; break; 490 case HDOM_QUOTE_SINGLE: $quote = '''; break; 491 default: $quote = ''; 492 } 493 $ret .= $key.$this->_[HDOM_INFO_SPACE][$i][1].'='.$this->_[HDOM_INFO_SPACE][$i][2].$quote.$val.$quote; 494 } 495 } 496 $ret = $this->dom->restore_noise($ret); 497 return $ret . $this->_[HDOM_INFO_ENDSPACE] . '>'; 498 } 499 500 // find elements by css selector 501 //PaperG - added ability for find to lowercase the value of the selector. 502 function find($selector, $idx=null, $lowercase=false) 503 { 504 $selectors = $this->parse_selector($selector); 505 if (($count=count($selectors))===0) return array(); 506 $found_keys = array(); 507 508 // find each selector 509 for ($c=0; $c<$count; $c) 510 { 511 // The change on the below line was documented on the sourceforge code tracker id 2788009 512 // used to be: if (($levle=count($selectors[0]))===0) return array(); 513 if (($levle=count($selectors[$c]))===0) return array(); 514 if (!isset($this->_[HDOM_INFO_BEGIN])) return array(); 515 516 $head = array($this->_[HDOM_INFO_BEGIN]=>1); 517 518 // handle descendant selectors, no recursive! 519 for ($l=0; $l<$levle; $l) 520 { 521 $ret = array(); 522 foreach ($head as $k=>$v) 523 { 524 $n = ($k===-1) ? $this->dom->root : $this->dom->nodes[$k]; 525 //PaperG - Pass this optional parameter on to the seek function. 526 $n->seek($selectors[$c][$l], $ret, $lowercase); 527 } 528 $head = $ret; 529 } 530 531 foreach ($head as $k=>$v) 532 { 533 if (!isset($found_keys[$k])) 534 $found_keys[$k] = 1; 535 } 536 } 537 538 // sort keys 539 ksort($found_keys); 540 541 $found = array(); 542 foreach ($found_keys as $k=>$v) 543 $found[] = $this->dom->nodes[$k]; 544 545 // return nth-element or array 546 if (is_null($idx)) return $found; 547 else if ($idx<0) $idx = count($found) $idx; 548 return (isset($found[$idx])) ? $found[$idx] : null; 549 } 550 551 // seek for given conditions 552 // PaperG - added parameter to allow for case insensitive testing of the value of a selector. 553 protected function seek($selector, &$ret, $lowercase=false) 554 { 555 global $debugObject; 556 if (is_object($debugObject)) { $debugObject->debugLogEntry(1); } 557 558 list($tag, $key, $val, $exp, $no_key) = $selector; 559 560 // xpath index 561 if ($tag && $key && is_numeric($key)) 562 { 563 $count = 0; 564 foreach ($this->children as $c) 565 { 566 if ($tag==='*' || $tag===$c->tag) { 567 if ( $count==$key) { 568 $ret[$c->_[HDOM_INFO_BEGIN]] = 1; 569 return; 570 } 571 } 572 } 573 return; 574 } 575 576 $end = (!empty($this->_[HDOM_INFO_END])) ? $this->_[HDOM_INFO_END] : 0; 577 if ($end==0) { 578 $parent = $this->parent; 579 while (!isset($parent->_[HDOM_INFO_END]) && $parent!==null) { 580 $end -= 1; 581 $parent = $parent->parent; 582 } 583 $end = $parent->_[HDOM_INFO_END]; 584 } 585 586 for ($i=$this->_[HDOM_INFO_BEGIN] 1; $i<$end; $i) { 587 $node = $this->dom->nodes[$i]; 588 589 $pass = true; 590 591 if ($tag==='*' && !$key) { 592 if (in_array($node, $this->children, true)) 593 $ret[$i] = 1; 594 continue; 595 } 596 597 // compare tag 598 if ($tag && $tag!=$node->tag && $tag!=='*') {$pass=false;} 599 // compare key 600 if ($pass && $key) { 601 if ($no_key) { 602 if (isset($node->attr[$key])) $pass=false; 603 } else { 604 if (($key != "plaintext") && !isset($node->attr[$key])) $pass=false; 605 } 606 } 607 // compare value 608 if ($pass && $key && $val && $val!=='*') { 609 // If they have told us that this is a "plaintext" search then we want the plaintext of the node - right? 610 if ($key == "plaintext") { 611 // $node->plaintext actually returns $node->text(); 612 $nodeKeyValue = $node->text(); 613 } else { 614 // this is a normal search, we want the value of that attribute of the tag. 615 $nodeKeyValue = $node->attr[$key]; 616 } 617 if (is_object($debugObject)) {$debugObject->debugLog(2, "testing node: " . $node->tag . " for attribute: " . $key . $exp . $val . " where nodes value is: " . $nodeKeyValue);} 618 619 //PaperG - If lowercase is set, do a case insensitive test of the value of the selector. 620 if ($lowercase) { 621 $check = $this->match($exp, strtolower($val), strtolower($nodeKeyValue)); 622 } else { 623 $check = $this->match($exp, $val, $nodeKeyValue); 624 } 625 if (is_object($debugObject)) {$debugObject->debugLog(2, "after match: " . ($check ? "true" : "false"));} 626 627 // handle multiple class 628 if (!$check && strcasecmp($key, 'class')===0) { 629 foreach (explode(' ',$node->attr[$key]) as $k) { 630 // Without this, there were cases where leading, trailing, or double spaces lead to our comparing blanks - bad form. 631 if (!empty($k)) { 632 if ($lowercase) { 633 $check = $this->match($exp, strtolower($val), strtolower($k)); 634 } else { 635 $check = $this->match($exp, $val, $k); 636 } 637 if ($check) break; 638 } 639 } 640 } 641 if (!$check) $pass = false; 642 } 643 if ($pass) $ret[$i] = 1; 644 unset($node); 645 } 646 // It's passed by reference so this is actually what this function returns. 647 if (is_object($debugObject)) {$debugObject->debugLog(1, "EXIT - ret: ", $ret);} 648 } 649 650 protected function match($exp, $pattern, $value) { 651 global $debugObject; 652 if (is_object($debugObject)) {$debugObject->debugLogEntry(1);} 653 654 switch ($exp) { 655 case '=': 656 return ($value===$pattern); 657 case '!=': 658 return ($value!==$pattern); 659 case '^=': 660 return preg_match("/^".preg_quote($pattern,'/')."/", $value); 661 case '$=': 662 return preg_match("/".preg_quote($pattern,'/')."$/", $value); 663 case '*=': 664 if ($pattern[0]=='/') { 665 return preg_match($pattern, $value); 666 } 667 return preg_match("/".$pattern."/i", $value); 668 } 669 return false; 670 } 671 672 protected function parse_selector($selector_string) { 673 global $debugObject; 674 if (is_object($debugObject)) {$debugObject->debugLogEntry(1);} 675 676 // pattern of CSS selectors, modified from mootools 677 // Paperg: Add the colon to the attrbute, so that it properly finds <tag attr:ibute="something" > like google does. 678 // Note: if you try to look at this attribute, yo MUST use getAttribute since $dom->x:y will fail the php syntax check. 679 // Notice the [ starting the attbute? and the @? following? This implies that an attribute can begin with an @ sign that is not captured. 680 // This implies that an html attribute specifier may start with an @ sign that is NOT captured by the expression. 681 // farther study is required to determine of this should be documented or removed. 682 // $pattern = "/([w-:*]*)(?:#([w-] )|.([w-] ))?(?:[@?(!?[w-] )(?:([!*^$]?=)["']?(.*?)["']?)?])?([/, ] )/is"; 683 $pattern = "/([w-:*]*)(?:#([w-] )|.([w-] ))?(?:[@?(!?[w-:] )(?:([!*^$]?=)["']?(.*?)["']?)?])?([/, ] )/is"; 684 preg_match_all($pattern, trim($selector_string).' ', $matches, PREG_SET_ORDER); 685 if (is_object($debugObject)) {$debugObject->debugLog(2, "Matches Array: ", $matches);} 686 687 $selectors = array(); 688 $result = array(); 689 //print_r($matches); 690 691 foreach ($matches as $m) { 692 $m[0] = trim($m[0]); 693 if ($m[0]==='' || $m[0]==='/' || $m[0]==='//') continue; 694 // for browser generated xpath 695 if ($m[1]==='tbody') continue; 696 697 list($tag, $key, $val, $exp, $no_key) = array($m[1], null, null, '=', false); 698 if (!empty($m[2])) {$key='id'; $val=$m[2];} 699 if (!empty($m[3])) {$key='class'; $val=$m[3];} 700 if (!empty($m[4])) {$key=$m[4];} 701 if (!empty($m[5])) {$exp=$m[5];} 702 if (!empty($m[6])) {$val=$m[6];} 703 704 // convert to lowercase 705 if ($this->dom->lowercase) {$tag=strtolower($tag); $key=strtolower($key);} 706 //elements that do NOT have the specified attribute 707 if (isset($key[0]) && $key[0]==='!') {$key=substr($key, 1); $no_key=true;} 708 709 $result[] = array($tag, $key, $val, $exp, $no_key); 710 if (trim($m[7])===',') { 711 $selectors[] = $result; 712 $result = array(); 713 } 714 } 715 if (count($result)>0) 716 $selectors[] = $result; 717 return $selectors; 718 } 719 720 function __get($name) { 721 if (isset($this->attr[$name])) 722 { 723 return $this->convert_text($this->attr[$name]); 724 } 725 switch ($name) { 726 case 'outertext': return $this->outertext(); 727 case 'innertext': return $this->innertext(); 728 case 'plaintext': return $this->text(); 729 case 'xmltext': return $this->xmltext(); 730 default: return array_key_exists($name, $this->attr); 731 } 732 } 733 734 function __set($name, $value) { 735 switch ($name) { 736 case 'outertext': return $this->_[HDOM_INFO_OUTER] = $value; 737 case 'innertext': 738 if (isset($this->_[HDOM_INFO_TEXT])) return $this->_[HDOM_INFO_TEXT] = $value; 739 return $this->_[HDOM_INFO_INNER] = $value; 740 } 741 if (!isset($this->attr[$name])) { 742 $this->_[HDOM_INFO_SPACE][] = array(' ', '', ''); 743 $this->_[HDOM_INFO_QUOTE][] = HDOM_QUOTE_DOUBLE; 744 } 745 $this->attr[$name] = $value; 746 } 747 748 function __isset($name) { 749 switch ($name) { 750 case 'outertext': return true; 751 case 'innertext': return true; 752 case 'plaintext': return true; 753 } 754 //no value attr: nowrap, checked selected... 755 return (array_key_exists($name, $this->attr)) ? true : isset($this->attr[$name]); 756 } 757 758 function __unset($name) { 759 if (isset($this->attr[$name])) 760 unset($this->attr[$name]); 761 } 762 763 // PaperG - Function to convert the text from one character set to another if the two sets are not the same. 764 function convert_text($text) 765 { 766 global $debugObject; 767 if (is_object($debugObject)) {$debugObject->debugLogEntry(1);} 768 769 $converted_text = $text; 770 771 $sourceCharset = ""; 772 $targetCharset = ""; 773 774 if ($this->dom) 775 { 776 $sourceCharset = strtoupper($this->dom->_charset); 777 $targetCharset = strtoupper($this->dom->_target_charset); 778 } 779 if (is_object($debugObject)) {$debugObject->debugLog(3, "source charset: " . $sourceCharset . " target charaset: " . $targetCharset);} 780 781 if (!empty($sourceCharset) && !empty($targetCharset) && (strcasecmp($sourceCharset, $targetCharset) != 0)) 782 { 783 // Check if the reported encoding could have been incorrect and the text is actually already UTF-8 784 if ((strcasecmp($targetCharset, 'UTF-8') == 0) && ($this->is_utf8($text))) 785 { 786 $converted_text = $text; 787 } 788 else 789 { 790 $converted_text = iconv($sourceCharset, $targetCharset, $text); 791 } 792 } 793 794 // Lets make sure that we don't have that silly BOM issue with any of the utf-8 text we output. 795 if ($targetCharset == 'UTF-8') 796 { 797 if (substr($converted_text, 0, 3) == "xefxbbxbf") 798 { 799 $converted_text = substr($converted_text, 3); 800 } 801 if (substr($converted_text, -3) == "xefxbbxbf") 802 { 803 $converted_text = substr($converted_text, 0, -3); 804 } 805 } 806 807 return $converted_text; 808 } 809 810 /** 811 * Returns true if $string is valid UTF-8 and false otherwise. 812 * 813 * @param mixed $str String to be tested 814 * @return boolean 815 */ 816 static function is_utf8($str) 817 { 818 $c=0; $b=0; 819 $bits=0; 820 $len=strlen($str); 821 for($i=0; $i<$len; $i ) 822 { 823 $c=ord($str[$i]); 824 if($c > 128) 825 { 826 if(($c >= 254)) return false; 827 elseif($c >= 252) $bits=6; 828 elseif($c >= 248) $bits=5; 829 elseif($c >= 240) $bits=4; 830 elseif($c >= 224) $bits=3; 831 elseif($c >= 192) $bits=2; 832 else return false; 833 if(($i $bits) > $len) return false; 834 while($bits > 1) 835 { 836 $i ; 837 $b=ord($str[$i]); 838 if($b < 128 || $b > 191) return false; 839 $bits--; 840 } 841 } 842 } 843 return true; 844 } 845 /* 846 function is_utf8($string) 847 { 848 //this is buggy 849 return (utf8_encode(utf8_decode($string)) == $string); 850 } 851 */ 852 853 /** 854 * Function to try a few tricks to determine the displayed size of an img on the page. 855 * NOTE: This will ONLY work on an IMG tag. Returns FALSE on all other tag types. 856 * 857 * @author John Schlick 858 * @version April 19 2012 859 * @return array an array containing the 'height' and 'width' of the image on the page or -1 if we can't figure it out. 860 */ 861 function get_display_size() 862 { 863 global $debugObject; 864 865 $width = -1; 866 $height = -1; 867 868 if ($this->tag !== 'img') 869 { 870 return false; 871 } 872 873 // See if there is aheight or width attribute in the tag itself. 874 if (isset($this->attr['width'])) 875 { 876 $width = $this->attr['width']; 877 } 878 879 if (isset($this->attr['height'])) 880 { 881 $height = $this->attr['height']; 882 } 883 884 // Now look for an inline style. 885 if (isset($this->attr['style'])) 886 { 887 // Thanks to user gnarf from stackoverflow for this regular expression. 888 $attributes = array(); 889 preg_match_all("/([w-] )s*:s*([^;] )s*;?/", $this->attr['style'], $matches, PREG_SET_ORDER); 890 foreach ($matches as $match) { 891 $attributes[$match[1]] = $match[2]; 892 } 893 894 // If there is a width in the style attributes: 895 if (isset($attributes['width']) && $width == -1) 896 { 897 // check that the last two characters are px (pixels) 898 if (strtolower(substr($attributes['width'], -2)) == 'px') 899 { 900 $proposed_width = substr($attributes['width'], 0, -2); 901 // Now make sure that it's an integer and not something stupid. 902 if (filter_var($proposed_width, FILTER_VALIDATE_INT)) 903 { 904 $width = $proposed_width; 905 } 906 } 907 } 908 909 // If there is a width in the style attributes: 910 if (isset($attributes['height']) && $height == -1) 911 { 912 // check that the last two characters are px (pixels) 913 if (strtolower(substr($attributes['height'], -2)) == 'px') 914 { 915 $proposed_height = substr($attributes['height'], 0, -2); 916 // Now make sure that it's an integer and not something stupid. 917 if (filter_var($proposed_height, FILTER_VALIDATE_INT)) 918 { 919 $height = $proposed_height; 920 } 921 } 922 } 923 924 } 925 926 // Future enhancement: 927 // Look in the tag to see if there is a class or id specified that has a height or width attribute to it. 928 929 // Far future enhancement 930 // Look at all the parent tags of this image to see if they specify a class or id that has an img selector that specifies a height or width 931 // Note that in this case, the class or id will have the img subselector for it to apply to the image. 932 933 // ridiculously far future development 934 // If the class or id is specified in a SEPARATE css file thats not on the page, go get it and do what we were just doing for the ones on the page. 935 936 $result = array('height' => $height, 937 'width' => $width); 938 return $result; 939 } 940 941 // camel naming conventions 942 function getAllAttributes() {return $this->attr;} 943 function getAttribute($name) {return $this->__get($name);} 944 function setAttribute($name, $value) {$this->__set($name, $value);} 945 function hasAttribute($name) {return $this->__isset($name);} 946 function removeAttribute($name) {$this->__set($name, null);} 947 function getElementById($id) {return $this->find("#$id", 0);} 948 function getElementsById($id, $idx=null) {return $this->find("#$id", $idx);} 949 function getElementByTagName($name) {return $this->find($name, 0);} 950 function getElementsByTagName($name, $idx=null) {return $this->find($name, $idx);} 951 function parentNode() {return $this->parent();} 952 function childNodes($idx=-1) {return $this->children($idx);} 953 function firstChild() {return $this->first_child();} 954 function lastChild() {return $this->last_child();} 955 function nextSibling() {return $this->next_sibling();} 956 function previousSibling() {return $this->prev_sibling();} 957 function hasChildNodes() {return $this->has_child();} 958 function nodeName() {return $this->tag;} 959 function appendChild($node) {$node->parent($this); return $node;} 960 961 } 962 963 /** 964 * simple html dom parser 965 * Paperg - in the find routine: allow us to specify that we want case insensitive testing of the value of the selector. 966 * Paperg - change $size from protected to public so we can easily access it 967 * Paperg - added ForceTagsClosed in the constructor which tells us whether we trust the html or not. Default is to NOT trust it. 968 * 969 * @package PlaceLocalInclude 970 */ 971 class simple_html_dom 972 { 973 public $root = null; 974 public $nodes = array(); 975 public $callback = null; 976 public $lowercase = false; 977 // Used to keep track of how large the text was when we started. 978 public $original_size; 979 public $size; 980 protected $pos; 981 protected $doc; 982 protected $char; 983 protected $cursor; 984 protected $parent; 985 protected $noise = array(); 986 protected $token_blank = " trn"; 987 protected $token_equal = ' =/>'; 988 protected $token_slash = " />rnt"; 989 protected $token_attr = ' >'; 990 // Note that this is referenced by a child node, and so it needs to be public for that node to see this information. 991 public $_charset = ''; 992 public $_target_charset = ''; 993 protected $default_br_text = ""; 994 public $default_span_text = ""; 995 996 // use isset instead of in_array, performance boost about 30%... 997 protected $self_closing_tags = array('img'=>1, 'br'=>1, 'input'=>1, 'meta'=>1, 'link'=>1, 'hr'=>1, 'base'=>1, 'embed'=>1, 'spacer'=>1); 998 protected $block_tags = array('root'=>1, 'body'=>1, 'form'=>1, 'div'=>1, 'span'=>1, 'table'=>1); 999 // Known sourceforge issue #29773411000 // B tags that are not closed cause us to return everything to the end of the document.1001 protected $optional_closing_tags = array(1002 'tr'=>array('tr'=>1, 'td'=>1, 'th'=>1),1003 'th'=>array('th'=>1),1004 'td'=>array('td'=>1),1005 'li'=>array('li'=>1),1006 'dt'=>array('dt'=>1, 'dd'=>1),1007 'dd'=>array('dd'=>1, 'dt'=>1),1008 'dl'=>array('dd'=>1, 'dt'=>1),1009 'p'=>array('p'=>1),1010 'nobr'=>array('nobr'=>1),1011 'b'=>array('b'=>1),1012 'option'=>array('option'=>1),1013 );1014 1015 function __construct($str=null, $lowercase=true, $forceTagsClosed=true, $target_charset=DEFAULT_TARGET_CHARSET, $stripRN=true, $defaultBRText=DEFAULT_BR_TEXT, $defaultSpanText=DEFAULT_SPAN_TEXT)1016 {1017 if ($str)1018 {1019 if (preg_match("/^http:///i",$str) || is_file($str))1020 {1021 $this->load_file($str);1022 }1023 else1024 {1025 $this->load($str, $lowercase, $stripRN, $defaultBRText, $defaultSpanText);1026 }1027 }1028 // Forcing tags to be closed implies that we don't trust the html, but it can lead to parsing errors if we SHOULD trust the html.1029 if (!$forceTagsClosed) {1030 $this->optional_closing_array=array();1031 }1032 $this->_target_charset = $target_charset;1033 }1034 1035 function __destruct()1036 {1037 $this->clear();1038 }1039 1040 // load html from string1041 function load($str, $lowercase=true, $stripRN=true, $defaultBRText=DEFAULT_BR_TEXT, $defaultSpanText=DEFAULT_SPAN_TEXT)1042 {1043 global $debugObject;1044 1045 // prepare1046 $this->prepare($str, $lowercase, $stripRN, $defaultBRText, $defaultSpanText);1047 // strip out comments1048 $this->remove_noise("''is");1049 // strip out cdata1050 $this->remove_noise("'<![CDATA[(.*?)]]>'is", true);1051 // Per sourceforge http://sourceforge.net/tracker/?func=detail&aid=2949097&group_id=218559&atid=10440371052 // Script tags removal now preceeds style tag removal.1053 // strip out <script> tags1054 $this->remove_noise("'<s*script[^>]*[^/]>(.*?)<s*/s*scripts*>'is");1055 $this->remove_noise("'<s*scripts*>(.*?)<s*/s*scripts*>'is");1056 // strip out <style> tags1057 $this->remove_noise("'<s*style[^>]*[^/]>(.*?)<s*/s*styles*>'is");1058 $this->remove_noise("'<s*styles*>(.*?)<s*/s*styles*>'is");1059 // strip out preformatted tags1060 $this->remove_noise("'<s*(?:code)[^>]*>(.*?)<s*/s*(?:code)s*>'is");1061 // strip out server side scripts1062 $this->remove_noise("'(<?)(.*?)(?>)'s", true);1063 // strip smarty scripts1064 $this->remove_noise("'({w)(.*?)(})'s", true);1065 1066 // parsing1067 while ($this->parse());1068 // end1069 $this->root->_[HDOM_INFO_END] = $this->cursor;1070 $this->parse_charset();1071 1072 // make load function chainable1073 return $this;1074 1075 }1076 1077 // load html from file1078 function load_file()1079 {1080 $args = func_get_args();1081 $this->load(call_user_func_array('file_get_contents', $args), true);1082 // Throw an error if we can't properly load the dom.1083 if (($error=error_get_last())!==null) {1084 $this->clear();1085 return false;1086 }1087 }1088 1089 // set callback function1090 function set_callback($function_name)1091 {1092 $this->callback = $function_name;1093 }1094 1095 // remove callback function1096 function remove_callback()1097 {1098 $this->callback = null;1099 }1100 1101 // save dom as string1102 function save($filepath='')1103 {1104 $ret = $this->root->innertext();1105 if ($filepath!=='') file_put_contents($filepath, $ret, LOCK_EX);1106 return $ret;1107 }1108 1109 // find dom node by css selector1110 // Paperg - allow us to specify that we want case insensitive testing of the value of the selector.1111 function find($selector, $idx=null, $lowercase=false)1112 {1113 return $this->root->find($selector, $idx, $lowercase);1114 }1115 1116 // clean up memory due to php5 circular references memory leak...1117 function clear()1118 {1119 foreach ($this->nodes as $n) {$n->clear(); $n = null;}1120 // This add next line is documented in the sourceforge repository. 2977248 as a fix for ongoing memory leaks that occur even with the use of clear.1121 if (isset($this->children)) foreach ($this->children as $n) {$n->clear(); $n = null;}1122 if (isset($this->parent)) {$this->parent->clear(); unset($this->parent);}1123 if (isset($this->root)) {$this->root->clear(); unset($this->root);}1124 unset($this->doc);1125 unset($this->noise);1126 }1127 1128 function dump($show_attr=true)1129 {1130 $this->root->dump($show_attr);1131 }1132 1133 // prepare HTML data and init everything1134 protected function prepare($str, $lowercase=true, $stripRN=true, $defaultBRText=DEFAULT_BR_TEXT, $defaultSpanText=DEFAULT_SPAN_TEXT)1135 {1136 $this->clear();1137 1138 // set the length of content before we do anything to it.1139 $this->size = strlen($str);1140 // Save the original size of the html that we got in. It might be useful to someone.1141 $this->original_size = $this->size;1142 1143 //before we save the string as the doc... strip out the r n's if we are told to.1144 if ($stripRN) {1145 $str = str_replace("r", " ", $str);1146 $str = str_replace("n", " ", $str);1147 1148 // set the length of content since we have changed it.1149 $this->size = strlen($str);1150 }1151 1152 $this->doc = $str;1153 $this->pos = 0;1154 $this->cursor = 1;1155 $this->noise = array();1156 $this->nodes = array();1157 $this->lowercase = $lowercase;1158 $this->default_br_text = $defaultBRText;1159 $this->default_span_text = $defaultSpanText;1160 $this->root = new simple_html_dom_node($this);1161 $this->root->tag = 'root';1162 $this->root->_[HDOM_INFO_BEGIN] = -1;1163 $this->root->nodetype = HDOM_TYPE_ROOT;1164 $this->parent = $this->root;1165 if ($this->size>0) $this->char = $this->doc[0];1166 }1167 1168 // parse html content1169 protected function parse()1170 {1171 if (($s = $this->copy_until_char('<'))==='')1172 {1173 return $this->read_tag();1174 }1175 1176 // text1177 $node = new simple_html_dom_node($this);1178 $this->cursor;1179 $node->_[HDOM_INFO_TEXT] = $s;1180 $this->link_nodes($node, false);1181 return true;1182 }1183 1184 // PAPERG - dkchou - added this to try to identify the character set of the page we have just parsed so we know better how to spit it out later.1185 // NOTE: IF you provide a routine called get_last_retrieve_url_contents_content_type which returns the CURLINFO_CONTENT_TYPE from the last curl_exec1186 // (or the content_type header from the last transfer), we will parse THAT, and if a charset is specified, we will use it over any other mechanism.1187 protected function parse_charset()1188 {1189 global $debugObject;1190 1191 $charset = null;1192 1193 if (function_exists('get_last_retrieve_url_contents_content_type'))1194 {1195 $contentTypeHeader = get_last_retrieve_url_contents_content_type();1196 $success = preg_match('/charset=(. )/', $contentTypeHeader, $matches);1197 if ($success)1198 {1199 $charset = $matches[1];1200 if (is_object($debugObject)) {$debugObject->debugLog(2, 'header content-type found charset of: ' . $charset);}1201 }1202 1203 }1204 1205 if (empty($charset))1206 {1207 $el = $this->root->find('meta[http-equiv=Content-Type]',0);1208 if (!empty($el))1209 {1210 $fullvalue = $el->content;1211 if (is_object($debugObject)) {$debugObject->debugLog(2, 'meta content-type tag found' . $fullvalue);}1212 1213 if (!empty($fullvalue))1214 {1215 $success = preg_match('/charset=(. )/', $fullvalue, $matches);1216 if ($success)1217 {1218 $charset = $matches[1];1219 }1220 else1221 {1222 // If there is a meta tag, and they don't specify the character set, research says that it's typically ISO-8859-11223 if (is_object($debugObject)) {$debugObject->debugLog(2, 'meta content-type tag couldn't be parsed. using iso-8859 default.');}1224 $charset = 'ISO-8859-1';1225 }1226 }1227 }1228 }1229 1230 // If we couldn't find a charset above, then lets try to detect one based on the text we got...1231 if (empty($charset))1232 {1233 // Have php try to detect the encoding from the text given to us.1234 $charset = mb_detect_encoding($this->root->plaintext . "ascii", $encoding_list = array( "UTF-8", "CP1252" ) );1235 if (is_object($debugObject)) {$debugObject->debugLog(2, 'mb_detect found: ' . $charset);}1236 1237 // and if this doesn't work... then we need to just wrongheadedly assume it's UTF-8 so that we can move on - cause this will usually give us most of what we need...1238 if ($charset === false)1239 {1240 if (is_object($debugObject)) {$debugObject->debugLog(2, 'since mb_detect failed - using default of utf-8');}1241 $charset = 'UTF-8';1242 }1243 }1244 1245 // Since CP1252 is a superset, if we get one of it's subsets, we want it instead.1246 if ((strtolower($charset) == strtolower('ISO-8859-1')) || (strtolower($charset) == strtolower('Latin1')) || (strtolower($charset) == strtolower('Latin-1')))1247 {1248 if (is_object($debugObject)) {$debugObject->debugLog(2, 'replacing ' . $charset . ' with CP1252 as its a superset');}1249 $charset = 'CP1252';1250 }1251 1252 if (is_object($debugObject)) {$debugObject->debugLog(1, 'EXIT - ' . $charset);}1253 1254 return $this->_charset = $charset;1255 }1256 1257 // read tag info1258 protected function read_tag()1259 {1260 if ($this->char!=='<')1261 {1262 $this->root->_[HDOM_INFO_END] = $this->cursor;1263 return false;1264 }1265 $begin_tag_pos = $this->pos;1266 $this->char = ( $this->pos<$this->size) ? $this->doc[$this->pos] : null; // next1267 1268 // end tag1269 if ($this->char==='/')1270 {1271 $this->char = ( $this->pos<$this->size) ? $this->doc[$this->pos] : null; // next1272 // This represents the change in the simple_html_dom trunk from revision 180 to 181.1273 // $this->skip($this->token_blank_t);1274 $this->skip($this->token_blank);1275 $tag = $this->copy_until_char('>');1276 1277 // skip attributes in end tag1278 if (($pos = strpos($tag, ' '))!==false)1279 $tag = substr($tag, 0, $pos);1280 1281 $parent_lower = strtolower($this->parent->tag);1282 $tag_lower = strtolower($tag);1283 1284 if ($parent_lower!==$tag_lower)1285 {1286 if (isset($this->optional_closing_tags[$parent_lower]) && isset($this->block_tags[$tag_lower]))1287 {1288 $this->parent->_[HDOM_INFO_END] = 0;1289 $org_parent = $this->parent;1290 1291 while (($this->parent->parent) && strtolower($this->parent->tag)!==$tag_lower)1292 $this->parent = $this->parent->parent;1293 1294 if (strtolower($this->parent->tag)!==$tag_lower) {1295 $this->parent = $org_parent; // restore origonal parent1296 if ($this->parent->parent) $this->parent = $this->parent->parent;1297 $this->parent->_[HDOM_INFO_END] = $this->cursor;1298 return $this->as_text_node($tag);1299 }1300 }1301 else if (($this->parent->parent) && isset($this->block_tags[$tag_lower]))1302 {1303 $this->parent->_[HDOM_INFO_END] = 0;1304 $org_parent = $this->parent;1305 1306 while (($this->parent->parent) && strtolower($this->parent->tag)!==$tag_lower)1307 $this->parent = $this->parent->parent;1308 1309 if (strtolower($this->parent->tag)!==$tag_lower)1310 {1311 $this->parent = $org_parent; // restore origonal parent1312 $this->parent->_[HDOM_INFO_END] = $this->cursor;1313 return $this->as_text_node($tag);1314 }1315 }1316 else if (($this->parent->parent) && strtolower($this->parent->parent->tag)===$tag_lower)1317 {1318 $this->parent->_[HDOM_INFO_END] = 0;1319 $this->parent = $this->parent->parent;1320 }1321 else1322 return $this->as_text_node($tag);1323 }1324 1325 $this->parent->_[HDOM_INFO_END] = $this->cursor;1326 if ($this->parent->parent) $this->parent = $this->parent->parent;1327 1328 $this->char = ( $this->pos<$this->size) ? $this->doc[$this->pos] : null; // next1329 return true;1330 }1331 1332 $node = new simple_html_dom_node($this);1333 $node->_[HDOM_INFO_BEGIN] = $this->cursor;1334 $this->cursor;1335 $tag = $this->copy_until($this->token_slash);1336 $node->tag_start = $begin_tag_pos;1337 1338 // doctype, cdata & comments...1339 if (isset($tag[0]) && $tag[0]==='!') {1340 $node->_[HDOM_INFO_TEXT] = '<' . $tag . $this->copy_until_char('>');1341 1342 if (isset($tag[2]) && $tag[1]==='-' && $tag[2]==='-') {1343 $node->nodetype = HDOM_TYPE_COMMENT;1344 $node->tag = 'comment';1345 } else {1346 $node->nodetype = HDOM_TYPE_UNKNOWN;1347 $node->tag = 'unknown';1348 }1349 if ($this->char==='>') $node->_[HDOM_INFO_TEXT].='>';1350 $this->link_nodes($node, true);1351 $this->char = ( $this->pos<$this->size) ? $this->doc[$this->pos] : null; // next1352 return true;1353 }1354 1355 // text1356 if ($pos=strpos($tag, '<')!==false) {1357 $tag = '<' . substr($tag, 0, -1);1358 $node->_[HDOM_INFO_TEXT] = $tag;1359 $this->link_nodes($node, false);1360 $this->char = $this->doc[--$this->pos]; // prev1361 return true;1362 }1363 1364 if (!preg_match("/^[w-:] $/", $tag)) {1365 $node->_[HDOM_INFO_TEXT] = '<' . $tag . $this->copy_until('<>');1366 if ($this->char==='<') {1367 $this->link_nodes($node, false);1368 return true;1369 }1370 1371 if ($this->char==='>') $node->_[HDOM_INFO_TEXT].='>';1372 $this->link_nodes($node, false);1373 $this->char = ( $this->pos<$this->size) ? $this->doc[$this->pos] : null; // next1374 return true;1375 }1376 1377 // begin tag1378 $node->nodetype = HDOM_TYPE_ELEMENT;1379 $tag_lower = strtolower($tag);1380 $node->tag = ($this->lowercase) ? $tag_lower : $tag;1381 1382 // handle optional closing tags1383 if (isset($this->optional_closing_tags[$tag_lower]) )1384 {1385 while (isset($this->optional_closing_tags[$tag_lower][strtolower($this->parent->tag)]))1386 {1387 $this->parent->_[HDOM_INFO_END] = 0;1388 $this->parent = $this->parent->parent;1389 }1390 $node->parent = $this->parent;1391 }1392 1393 $guard = 0; // prevent infinity loop1394 $space = array($this->copy_skip($this->token_blank), '', '');1395 1396 // attributes1397 do1398 {1399 if ($this->char!==null && $space[0]==='')1400 {1401 break;1402 }1403 $name = $this->copy_until($this->token_equal);1404 if ($guard===$this->pos)1405 {1406 $this->char = ( $this->pos<$this->size) ? $this->doc[$this->pos] : null; // next1407 continue;1408 }1409 $guard = $this->pos;1410 1411 // handle endless '<'1412 if ($this->pos>=$this->size-1 && $this->char!=='>') {1413 $node->nodetype = HDOM_TYPE_TEXT;1414 $node->_[HDOM_INFO_END] = 0;1415 $node->_[HDOM_INFO_TEXT] = '<'.$tag . $space[0] . $name;1416 $node->tag = 'text';1417 $this->link_nodes($node, false);1418 return true;1419 }1420 1421 // handle mismatch '<'1422 if ($this->doc[$this->pos-1]=='<') {1423 $node->nodetype = HDOM_TYPE_TEXT;1424 $node->tag = 'text';1425 $node->attr = array();1426 $node->_[HDOM_INFO_END] = 0;1427 $node->_[HDOM_INFO_TEXT] = substr($this->doc, $begin_tag_pos, $this->pos-$begin_tag_pos-1);1428 $this->pos -= 2;1429 $this->char = ( $this->pos<$this->size) ? $this->doc[$this->pos] : null; // next1430 $this->link_nodes($node, false);1431 return true;1432 }1433 1434 if ($name!=='/' && $name!=='') {1435 $space[1] = $this->copy_skip($this->token_blank);1436 $name = $this->restore_noise($name);1437 if ($this->lowercase) $name = strtolower($name);1438 if ($this->char==='=') {1439 $this->char = ( $this->pos<$this->size) ? $this->doc[$this->pos] : null; // next1440 $this->parse_attr($node, $name, $space);1441 }1442 else {1443 //no value attr: nowrap, checked selected...1444 $node->_[HDOM_INFO_QUOTE][] = HDOM_QUOTE_NO;1445 $node->attr[$name] = true;1446 if ($this->char!='>') $this->char = $this->doc[--$this->pos]; // prev1447 }1448 $node->_[HDOM_INFO_SPACE][] = $space;1449 $space = array($this->copy_skip($this->token_blank), '', '');1450 }1451 else1452 break;1453 } while ($this->char!=='>' && $this->char!=='/');1454 1455 $this->link_nodes($node, true);1456 $node->_[HDOM_INFO_ENDSPACE] = $space[0];1457 1458 // check self closing1459 if ($this->copy_until_char_escape('>')==='/')1460 {1461 $node->_[HDOM_INFO_ENDSPACE] .= '/';1462 $node->_[HDOM_INFO_END] = 0;1463 }1464 else1465 {1466 // reset parent1467 if (!isset($this->self_closing_tags[strtolower($node->tag)])) $this->parent = $node;1468 }1469 $this->char = ( $this->pos<$this->size) ? $this->doc[$this->pos] : null; // next1470 1471 // If it's a BR tag, we need to set it's text to the default text.1472 // This way when we see it in plaintext, we can generate formatting that the user wants.1473 // since a br tag never has sub nodes, this works well.1474 if ($node->tag == "br")1475 {1476 $node->_[HDOM_INFO_INNER] = $this->default_br_text;1477 }1478 1479 return true;1480 }1481 1482 // parse attributes1483 protected function parse_attr($node, $name, &$space)1484 {1485 // Per sourceforge: http://sourceforge.net/tracker/?func=detail&aid=3061408&group_id=218559&atid=10440371486 // If the attribute is already defined inside a tag, only pay atetntion to the first one as opposed to the last one.1487 if (isset($node->attr[$name]))1488 {1489 return;1490 }1491 1492 $space[2] = $this->copy_skip($this->token_blank);1493 switch ($this->char) {1494 case '"':1495 $node->_[HDOM_INFO_QUOTE][] = HDOM_QUOTE_DOUBLE;1496 $this->char = ( $this->pos<$this->size) ? $this->doc[$this->pos] : null; // next1497 $node->attr[$name] = $this->restore_noise($this->copy_until_char_escape('"'));1498 $this->char = ( $this->pos<$this->size) ? $this->doc[$this->pos] : null; // next1499 break;1500 case ''':1501 $node->_[HDOM_INFO_QUOTE][] = HDOM_QUOTE_SINGLE;1502 $this->char = ( $this->pos<$this->size) ? $this->doc[$this->pos] : null; // next1503 $node->attr[$name] = $this->restore_noise($this->copy_until_char_escape('''));1504 $this->char = ( $this->pos<$this->size) ? $this->doc[$this->pos] : null; // next1505 break;1506 default:1507 $node->_[HDOM_INFO_QUOTE][] = HDOM_QUOTE_NO;1508 $node->attr[$name] = $this->restore_noise($this->copy_until($this->token_attr));1509 }1510 // PaperG: Attributes should not have r or n in them, that counts as html whitespace.1511 $node->attr[$name] = str_replace("r", "", $node->attr[$name]);1512 $node->attr[$name] = str_replace("n", "", $node->attr[$name]);1513 // PaperG: If this is a "class" selector, lets get rid of the preceeding and trailing space since some people leave it in the multi class case.1514 if ($name == "class") {1515 $node->attr[$name] = trim($node->attr[$name]);1516 }1517 }1518 1519 // link node's parent1520 protected function link_nodes(&$node, $is_child)1521 {1522 $node->parent = $this->parent;1523 $this->parent->nodes[] = $node;1524 if ($is_child)1525 {1526 $this->parent->children[] = $node;1527 }1528 }1529 1530 // as a text node1531 protected function as_text_node($tag)1532 {1533 $node = new simple_html_dom_node($this);1534 $this->cursor;1535 $node->_[HDOM_INFO_TEXT] = '</' . $tag . '>';1536 $this->link_nodes($node, false);1537 $this->char = ( $this->pos<$this->size) ? $this->doc[$this->pos] : null; // next1538 return true;1539 }1540 1541 protected function skip($chars)1542 {1543 $this->pos = strspn($this->doc, $chars, $this->pos);1544 $this->char = ($this->pos<$this->size) ? $this->doc[$this->pos] : null; // next1545 }1546 1547 protected function copy_skip($chars)1548 {1549 $pos = $this->pos;1550 $len = strspn($this->doc, $chars, $pos);1551 $this->pos = $len;1552 $this->char = ($this->pos<$this->size) ? $this->doc[$this->pos] : null; // next1553 if ($len===0) return '';1554 return substr($this->doc, $pos, $len);1555 }1556 1557 protected function copy_until($chars)1558 {1559 $pos = $this->pos;1560 $len = strcspn($this->doc, $chars, $pos);1561 $this->pos = $len;1562 $this->char = ($this->pos<$this->size) ? $this->doc[$this->pos] : null; // next1563 return substr($this->doc, $pos, $len);1564 }1565 1566 protected function copy_until_char($char)1567 {1568 if ($this->char===null) return '';1569 1570 if (($pos = strpos($this->doc, $char, $this->pos))===false) {1571 $ret = substr($this->doc, $this->pos, $this->size-$this->pos);1572 $this->char = null;1573 $this->pos = $this->size;1574 return $ret;1575 }1576 1577 if ($pos===$this->pos) return '';1578 $pos_old = $this->pos;1579 $this->char = $this->doc[$pos];1580 $this->pos = $pos;1581 return substr($this->doc, $pos_old, $pos-$pos_old);1582 }1583 1584 protected function copy_until_char_escape($char)1585 {1586 if ($this->char===null) return '';1587 1588 $start = $this->pos;1589 while (1)1590 {1591 if (($pos = strpos($this->doc, $char, $start))===false)1592 {1593 $ret = substr($this->doc, $this->pos, $this->size-$this->pos);1594 $this->char = null;1595 $this->pos = $this->size;1596 return $ret;1597 }1598 1599 if ($pos===$this->pos) return '';1600 1601 if ($this->doc[$pos-1]==='') {1602 $start = $pos 1;1603 continue;1604 }1605 1606 $pos_old = $this->pos;1607 $this->char = $this->doc[$pos];1608 $this->pos = $pos;1609 return substr($this->doc, $pos_old, $pos-$pos_old);1610 }1611 }1612 1613 // remove noise from html content1614 // save the noise in the $this->noise array.1615 protected function remove_noise($pattern, $remove_tag=false)1616 {1617 global $debugObject;1618 if (is_object($debugObject)) { $debugObject->debugLogEntry(1); }1619 1620 $count = preg_match_all($pattern, $this->doc, $matches, PREG_SET_ORDER|PREG_OFFSET_CAPTURE);1621 1622 for ($i=$count-1; $i>-1; --$i)1623 {1624 $key = '___noise___'.sprintf('% 5d', count($this->noise) 1000);1625 if (is_object($debugObject)) { $debugObject->debugLog(2, 'key is: ' . $key); }1626 $idx = ($remove_tag) ? 0 : 1;1627 $this->noise[$key] = $matches[$i][$idx][0];1628 $this->doc = substr_replace($this->doc, $key, $matches[$i][$idx][1], strlen($matches[$i][$idx][0]));1629 }1630 1631 // reset the length of content1632 $this->size = strlen($this->doc);1633 if ($this->size>0)1634 {1635 $this->char = $this->doc[0];1636 }1637 }1638 1639 // restore noise to html content1640 function restore_noise($text)1641 {1642 global $debugObject;1643 if (is_object($debugObject)) { $debugObject->debugLogEntry(1); }1644 1645 while (($pos=strpos($text, '___noise___'))!==false)1646 {1647 // Sometimes there is a broken piece of markup, and we don't GET the pos 11 etc... token which indicates a problem outside of us...1648 if (strlen($text) > $pos 15)1649 {1650 $key = '___noise___'.$text[$pos 11].$text[$pos 12].$text[$pos 13].$text[$pos 14].$text[$pos 15];1651 if (is_object($debugObject)) { $debugObject->debugLog(2, 'located key of: ' . $key); }1652 1653 if (isset($this->noise[$key]))1654 {1655 $text = substr($text, 0, $pos).$this->noise[$key].substr($text, $pos 16);1656 }1657 else1658 {1659 // do this to prevent an infinite loop.1660 $text = substr($text, 0, $pos).'UNDEFINED NOISE FOR KEY: '.$key . substr($text, $pos 16);1661 }1662 }1663 else1664 {1665 // There is no valid key being given back to us... We must get rid of the ___noise___ or we will have a problem.1666 $text = substr($text, 0, $pos).'NO NUMERIC NOISE KEY' . substr($text, $pos 11);1667 }1668 }1669 return $text;1670 }1671 1672 // Sometimes we NEED one of the noise elements.1673 function search_noise($text)1674 {1675 global $debugObject;1676 if (is_object($debugObject)) { $debugObject->debugLogEntry(1); }1677 1678 foreach($this->noise as $noiseElement)1679 {1680 if (strpos($noiseElement, $text)!==false)1681 {1682 return $noiseElement;1683 }1684 }1685 }1686 function __toString()1687 {1688 return $this->root->innertext();1689 }1690 1691 function __get($name)1692 {1693 switch ($name)1694 {1695 case 'outertext':1696 return $this->root->innertext();1697 case 'innertext':1698 return $this->root->innertext();1699 case 'plaintext':1700 return $this->root->text();1701 case 'charset':1702 return $this->_charset;1703 case 'target_charset':1704 return $this->_target_charset;1705 }1706 }1707 1708 // camel naming conventions1709 function childNodes($idx=-1) {return $this->root->childNodes($idx);}1710 function firstChild() {return $this->root->first_child();}1711 function lastChild() {return $this->root->last_child();}1712 function createElement($name, $value=null) {return @str_get_html("<$name>$value</$name>")->first_child();}1713 function createTextNode($value) {return @end(str_get_html($value)->nodes);}1714 function getElementById($id) {return $this->find("#$id", 0);}1715 function getElementsById($id, $idx=null) {return $this->find("#$id", $idx);}1716 function getElementByTagName($name) {return $this->find($name, 0);}1717 function getElementsByTagName($name, $idx=-1) {return $this->find($name, $idx);}1718 function loadFile() {$args = func_get_args();$this->load_file($args);}1719 }1720 1721 ?>

3：创建抓包代码 test.php,代码如下：

1 <?php 2 //error_reporting(0); 3 set_time_limit(0); 4 include_once 'simple_html_dom.php'; 5 echo '<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />'; 6 //$data = '#在抖音，记录美好生活#【库存】抛个硬币如果摔碎了今天我就不吃饭了 #校园生活 #大学 https://v.douyin.com/p9taJa/ 复制此链接，打开【抖音短视频】，直接观看视频！'; 7 $data = '#科比愿你去的地方也有篮球陪伴，也能披着24号紫金战衣！ #动态壁纸 https://v.douyin.com/36xkCS/ 复制此链接，打开【抖音短视频】，直接观看视频！'; 8 $data = getData($data); 9 echo json_encode($data); 10 11 function getData($data){ 12 $url = getUrl($data); 13 $cookie_jar = dirname(__FILE__).'/tmp.txt';//tempnam('./tmp','cookie'); 14 $data = get_content($url, $cookie_jar); 15 16 $page = str_get_html($data); 17 18 $data = array( 19 'base'=>array( 20 'headimg'=>false, // 头像 21 'name'=>false, // 昵称 22 'title'=>false, // 标题（姑且叫标题吧） 23 'description'=>false // 描述 24 ), 25 'video'=>array( 26 'cover'=>false, // 封面 27 'src'=>false, // 路径 28 'width'=>false, // 宽度 29 'height'=>false // 高度 30 ) 31 ); 32 $user = $page->find('div[class=user-info]'); 33 // 头像、昵称 34 if(count($user) > 0){ 35 $img = $user[0]->find('div[class=avatar]'); 36 if(count($img) > 0){ 37 $img = $img[0]->find('img'); 38 if(count($img) > 0){ 39 // 头像 40 $data['base']['headimg'] = $img[0]->src; 41 // 昵称 42 $data['base']['name'] = $img[0]->alt; 43 } 44 } 45 } 46 // 标题、描述 47 $title = $page->find('div[class=challenge-info]'); 48 if(count($title) > 0){ 49 $description = $title[0]->next_sibling(); 50 $title = $title[0]->first_child()->first_child(); 51 $data['base']['title'] = $title->innertext; 52 $data['base']['description'] = $description->innertext; 53 } 54 $video = $page->find('div[id=pageletReflowVideo]'); 55 if(count($video) > 0){ 56 $script = $video[0]->next_sibling(); 57 if(!empty($script)){ 58 $script = $script->next_sibling(); 59 if(!empty($script)){ 60 $script = $script->next_sibling()->innertext; 61 $data['video'] = getVideo($script); 62 } 63 } 64 65 66 } 67 return $data; 68 } 69 70 function getVideo($scripts){ 71 $video = array(); 72 $scripts = preg_replace('/s /','',$scripts); 73 // 宽度 74 preg_match('/videoWidth:([0-9.]*),/' , $scripts, $matches); 75 if(empty($matches) || count($matches) < 2){ 76 $video['width'] = false; 77 }else{ 78 $video['width'] = $matches[1]; 79 } 80 // 高度 81 preg_match('/videoHeight:([0-9.]*),/' , $scripts, $matches); 82 if(empty($matches) || count($matches) < 2){ 83 $video['height'] = false; 84 }else{ 85 $video['height'] = $matches[1]; 86 } 87 // 视频路径 88 preg_match('/playAddr:"(.*)",/' , $scripts, $matches); 89 if(empty($matches) || count($matches) < 2){ 90 $video['src'] = false; 91 }else{ 92 $video['src'] = $matches[1]; 93 } 94 // 封面 95 preg_match('/cover:"(.*)"}/' , $scripts, $matches); 96 if(empty($matches) || count($matches) < 2){ 97 $video['cover'] = false; 98 }else{ 99 $video['cover'] = $matches[1];100 }101 return $video;102 }103 function get_content($url, $cookie,$referfer='') {104 //var_dump($post);exit;105 $useragent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)";106 /*if ($curl_loops >= $curl_max_loops) {107 $curl_loops = 0;108 return false;109 }*/110 if($referfer == ''){111 $referfer = 'https://www.kujiale.com/';112 }113 $header = array("Referer: ".$referfer); 114 $curl = curl_init();//初始化curl模块 115 curl_setopt($curl, CURLOPT_URL, $url);//登录提交的地址 116 curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false); //不验证证书117 curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false); //不验证证书118 curl_setopt($curl, CURLOPT_HEADER, 1);//是否显示头信息 119 curl_setopt($curl, CURLOPT_HTTPHEADER,$header); 120 //curl_setopt ($curl,CURLOPT_REFERER,'http://www.kujiale.com/');121 curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);//是否自动显示返回的信息 122 curl_setopt($curl, CURLOPT_COOKIEFILE, $cookie); //设置Cookie信息保存在指定的文件中 123 curl_setopt($curl, CURLOPT_COOKIEJAR, $cookie); //设置Cookie信息保存在指定的文件中 124 //curl_setopt($curl, CURLOPT_POST, 1);//post方式提交 125 //curl_setopt($curl, CURLOPT_POSTFIELDS, http_build_query($post));//要提交的信息 126 //curl_setopt($curl,CURLOPT_POSTFIELDS,$post);127 128 curl_setopt($curl, CURLOPT_USERAGENT, $useragent);129 //curl_setopt($curl, CURLOPT_REFERER, 'http://www.kujiale.com/');130 $data = curl_exec($curl);//执行cURL 131 $ret = $data;132 list($header, $data) = explode("rnrn", $data, 2);133 $http_code = curl_getinfo($curl, CURLINFO_HTTP_CODE);134 $last_url = curl_getinfo($curl, CURLINFO_EFFECTIVE_URL);135 //var_dump($last_url);136 //$httpCode = curl_getinfo($curl,CURLINFO_HTTP_CODE);137 //var_dump($httpCode);138 //echo '<hr/>';139 curl_close($curl);//关闭cURL资源，并且释放系统资源 140 if ($http_code == 301 || $http_code == 302) {141 $matches = array();142 preg_match('/Location:(.*?)n/', $header, $matches);143 $url = @parse_url(trim(array_pop($matches)));144 if (!$url) {145 $curl_loops = 0;146 return $data;147 }148 $new_url = $url['scheme'] . '://' . $url['host'] . $url['path']149 . (isset($url['query']) ? '?' . $url['query'] : '');150 $new_url = stripslashes($new_url);151 return get_content($new_url,$cookie);152 } else {153 $curl_loops = 0;154 list($header, $data) = explode("rnrn", $ret, 2);155 return $data;156 }157 } 158 function getUrl($data){159 preg_match('/https://v.douyin.com/.*//' , $data, $matches);160 if(empty($matches) || count($matches) != 1){161 return false;162 }else{163 return $matches[0];164 }165 }166 ?>

　将test.php中的$data 换成自己复制出来的链接就可以了。返回的是json格式的内容，可以直接渲染在前台也可以存到数据库里面。

到此从抖音上抓包视频内容就完成了，写的不好，请大家勿喷。大家有什么意见，欢迎在评论区留言。我看到了会回复大家，谢谢。

第四届黑龙江绿博会四大亮点抢先看

中华网黑龙江频道讯由黑龙江省人民政府和哈尔滨市人民政府共同主办的第四届黑龙江国际绿色有机食品产业博览会暨哈尔滨世界农业博览会将于22日，在哈尔滨国际会展体育中心拉开帷幕。据悉本次展会时间为9月22至26日。

黑龙江省贸促会副会长、会展事务局局长李德山李德山对绿博会情况进行介绍

黑龙江省贸促会副会长、会展事务局局长李德山向记者介绍，本届绿博会有来自俄罗斯、法国、泰国、韩国、马来西亚、越南、香港、台湾等9个国家和地区参展，27个省（区、市）及计划单列市的930家企业参展，展览总面积68000㎡。其中，室内展览面积为36000㎡，1720个国际标准展位，室外展览面积25000㎡。

本届绿博会在展览主题上，紧紧围绕贯彻落实习近平总书记在黑龙江考察工作期间提出“粮头食尾、农头工尾，构建现代农业产业体系、生产体系、经营体系”的要求，深入阐释绿水青山的金山银山价值。

在展览内容上，稻米主题展区有40余家优秀大米加工企业集中展示500㎡的五常大米；互联网农业展区面积189㎡，有北京农信互联、农管家等10余户农业互联网领军企业集中展示；农业科技展区面积816㎡，展示300余件展品，300余项农业科技成果与技术。

在展览形式上，与上届绿博会相比，传统图文实物展示比例有所下降，农业科技产品数量增多。如，粮食交易大数据实时显示屏，Root机器人，秸秆打捆设备等。展览形式创新、朴素新颖，特装比例有所提高。

第四届绿博会突出四方面特点：其一，参展商和采购商阵容强大。本届绿博会有27个省（区、市）及计划单列市政府团带领本地企业参展参会，并参加黑龙江第十三届金秋粮食交易暨产业合作洽谈会。其中，世界500强2家，中国500强9家，有42位企业董事长、70余位总经理应邀参加本届绿博会。其二，生态农业双创形成亮点。首届黑龙江生态农业双创论坛及成果展是第四届绿博会的主题展区，展览面积600㎡，分为6个区域，有近50个创业团队展示最新成果。其三，主题配套活动丰富务实。本届绿博会共举办主题配套活动近20场次，宣传推介黑龙江省整体生态性优势、发展潜力和创新成果，为绿色食品产业发展凝聚共识。其四，展览专业化水平更加突出。本届绿博会更加倾向专业化办展方向，形成了特色鲜明的稻米主题展区，宣传黑龙江大米文化内涵，推广大米食用健康理念开展大米品鉴活动，展会期间每天上午、下午各一场次。由观众现场品尝，并通过口感进行品鉴投票。农机展区重点展示约翰迪尔、凯斯纽荷兰、一拖、福田雷沃等20余家国内外知名农机装备制造企业参展，参展机具超过1500台套，展出玉米穗茎兼收、侧深施肥、秸秆捡拾打捆等新技术装备均处于国际先进水平。室外展区还有双鸭山湿地展示，该湿地是世界自然基金会划定的重要生态区之一。此外，还有互联网、农业科技、创业创新、省外名优食品等主题展区，及金秋粮食交易会展区。专业化水平不断提升引导生产要素高度集聚，特色品牌进一步突出，使生产、创新、销售、流通、电商各产业环节融合发展，将全国大粮仓变为绿色粮仓、绿色菜园、绿色厨房。

记者从绿博会组委会获悉，9月25日前网上购票可享7.5折优惠，原价20元门票只需15元即可购得。同时将会推出每日2小时限时特价抢购活动，抢购时间为12:00-13:00/天，19:00-20:00/天。活动时间内购票可享半价优惠，原价20元门票仅售10元。

广大市民可通过微信搜索“黑龙江省会展事务局”微信公众号并关注官方微信平台，关注后点击屏幕下方菜单中的“绿博会”选项，选择“在线购票”即可购买本届绿博会门票。展会开放期间凭购票所得二维码到展会现场售票窗口兑换门票及礼品券。礼品兑换2016年9月22日-26日，兑换地点为展馆内AB或BC消防通道。所有礼品赠完为止。

购票即赠价格50元的礼品。价值20元和粮米业八宝米或稻花香一袋；价值10元忠芝蓝莓饮品一瓶；价值10元菌圣宝饮品一瓶；冰城牧业10元代金券。

米格-29战斗机

米格-29（俄语：Микоян МиГ-29），苏联米高扬-古列维奇飞机设计局于1970年代设计的双发高性能制空战斗机，同时期开发之战机亦有Su-27，目的为对抗美军之同期先进战机F-15。北约组织给予的绰号是“支点”（Fulcrum）。

和苏-27一样，米格-29的历史始于1969年，苏联获知美国空军正在进行“FX”计划（即最终形成F-15的计划）时。苏联领导人意识到，新的美军飞机将会对苏联现有的所有战斗机都形成巨大的技术优势。米格-21算是当时机动性很高的飞机，然而在航程，武装与升级潜能上有相当多的缺点。以对抗美国F-4为研发重点的米格-23飞行速度较高，也有较多的携带燃料与装备的空间，可是欠缺缠斗中需要的运动性。苏联欠缺的是一款在各方面性能都相当均衡的战斗机，具有优异的运动性和高性能的航电系统。对此，俄国参谋本部发出“先进战术战斗机”（Перспективний Фронтовой Истребитель，ПФИ）的需求案，其中诸项性能要求相当高，包括高航程，优异的短场起降能力（包括使用简易机场的能力），高敏捷性，超过两马赫极速和重武装。新飞机的空气动力设计交由苏联空气动力研究所（TsAGI）负责，成果与苏霍伊飞机公司一同分享。

然而，苏联认为先进战术战斗机的价格会太昂贵，生产数量将无法满足需求，于是在1971年将这个项目拆成两种，即重型先进战术战斗机（Тяжелый Перспективний Фронтовой Истребитель, TПФИ）和轻型先进战术战斗机（Легкый Перспективний Фронтовой Истребитель, ЛПФИ）。在同时期，美国空军也进行类似的规划，推出轻型战斗机（Lightweight Fighter）的项目和F-16战隼以及YF-17。重型战机的项目依旧由苏霍伊负责，轻型战机的项目则交由米高扬飞机设计局，苏-27侧卫即是前者的成果。后者在1974年提出9号（Product 9）项目案，也就是米格-29A，并在1977年10月6日首飞。美国侦查卫星在同年11月发现原型机，由于原型机是在拉明斯基镇附近进的朱科夫斯基（Жуковский）试飞场被发现，于是给予白羊座-L（Ram-L）的代号。早期西方推测白羊座-L的外型类似YF-17，并配备具有后燃器的R-25涡轮喷气发动机。

尽管两架原型机的坠毁造成了项目延迟，1983年，米格-29B量产机仍然在库宾卡基地开始了操作评估。该项评估于隔年完成，并于同年开始交机。由于北约未能知悉其预量产型米格-29A的存在，故该机被北约命名为"支点A"。降级改装后的米格-29B被以"米格-29B 9-12A"的型号大量出口给华约国家（出口非华约国家者则编号为"米格-29B 9-12B"）。这些飞机，一共生产了840架，相较于苏联版本米格-29B，它们的功能多有降低且不具备核武投射能力。1986年7月，苏联以一队米格-29访问芬兰。直到这时，米格-29才正式公诸于西方大众。1988年9月，苏联曾以两架米格-29参加英国法茵堡航空展。一总说来，西方观察家对其机动性能印象深刻，但引擎排烟过于明显则是其重大缺点。

苏联精确的区分各种版本的米格-29，即使只有电子设备的改良，然而米高杨推出的多种米格-29，例如舰载机机种米格-29K，都不曾进入量产阶段。在前苏联时期，由于米高杨设计局相对于竞争对手苏霍伊，明显的缺乏政治影响力，造成在开发米格-29过程中，遭遇了许多挫折。许多更先进的型号仍在争取出口许可，而升级苏联旧型的米格-29订单也还在争取中。新型机种如米格-29SMT和米格-29M1/M2的开发也才起步。舰载机种米格-29K的发展因为印度海军航空母舰维克拉姆帝亚号（前苏联海军戈尔什科夫海军上将号航空母舰）的需求得以恢复。原本米格-29K是设计供给库兹涅佐夫号航空母舰使用，但后来却使用较大型的苏-33。

2016年叙利亚战争中俄罗斯库兹涅佐夫航母前往出战，新型的米格-29K首次投入实战进行地面轰炸任务，大幅消灭各种地面敌军，战争中一架米格-29K着舰失败坠毁但飞行员跳伞成功。

由于设计上的基本参数是来自于TsAGI为PFI项目所进行的运算，米格-29的气动力外型设计与Su-27战斗机非常相似，其中较为明显的差异包括：米格-29的主要结构材料以铝为主再加上一些复合材料，机翼是后掠中单翼加上融合成一体的翼前缘延伸面（leading-edge root extensions，LERSx），后掠角为40度。机身后方位于引擎位侧的是有后掠角度的水平尾翼与双垂直安定面，翼前缘自动缝翼在早期型上分长4个部分（后期型改为5个），翼后缘则有襟翼和副翼。

米格-29采用液压控制与SAU-451三轴自动飞行仪，并不像Su-27使用线传飞控系统，尽管如此，米格-29仍是一架非常灵活的战机，无论是瞬时或者是持续回转性能均接近西方战机的平均水准，在400节以上速度时拥有和F-16战斗机相近的盘旋速度，能量损失率仅为F-16的四倍（60%和240%）。米格-29拥有良好的高攻角性能而且不容易进入水平螺旋，机身结构足以承受9G（88E米/秒²）的运动。控制杆有一个可以手动取消的限制，以免飞行员超过g或是攻角限制。

米格-29引擎使用2具克里莫夫RD-33涡轮扇引擎，每具提供50.0千牛顿推力，引导后燃器时可达81.3千牛顿。两具引擎中间的空间，能提供额外昇力，并降低翼负荷，有效增加机动性。引擎的楔形进气口位在翼根前缘延伸板的下方，并且有可调整的进气门，以供高马赫飞行时的进气需求。在野战跑道起飞、降落或低速飞行时，进气门可以几乎完全关闭，防止地面异物吸入引擎。在这种情况下，翼根前缘延伸板上方的活动进气口会自动打开，引擎能从此获得所需空气。后续的型号使用类似设备在Su-27战斗机进气口的网状栅栏，取代原来机背进气口的功能。

初期型米格-29B的内部油箱总容量仅4,365升，总数6个的油箱分别在两翼各1个，另外四个则在机身，造成米格-29B的航程有限，然而已经匹配苏联的原始需求：一架点防御的前线战斗机。在进行长途飞行时，机腹中线能携带一具1,500升的副油箱，后续的量产型也能在两翼下各携带一个1,150升的副油箱。此外，少数的米格-29在机身左侧加装空中加油设备，使用探针和锥管的方式进行空中加油来提升滞空时间。

一些米格-29B（米格-29 9-13）修改机背的设计（Fatback），增加内部的燃油携带量。更先进的后续型号如米格-35能在机背携带适形油箱。

米格-29配备了K-36DM弹射椅，曾在诸多紧急状况中发挥过令人印象深刻的表现。

座舱中配备有抬头显示器以及ZSh-3UM头盔瞄准具。然而，米格-29并未配备西方战机广泛采用的手置摇杆。特别要强调的是；为了方便飞行员进行机种转换，米格-29的驾驶舱没有大量使用人体工学设计，而是使其尽可能与先前的米格-23类似。不过，由于采用了泡型座舱，驾驶舱视野较过去的苏联战机改善许多，但仍然不及同时期的西方战斗机。后期的改良型设备有多功能显示器的玻璃座舱，并且改换了真正的手置摇杆。

米格-29B配备了法左特龙RLPK-29火控系统，包括了一具支持俯视/俯射的N-019脉冲多普勒雷达（北约命名：Slot Back）与Ts100.02-02火控计算机。早期的N-019A雷达号称可使米格-29与西方对手并驾齐驱，但苏联空军对它的经验是令人失望的。在视距外接战战斗机大小的目标时，其所提供的追踪距离只有：对头70千米/尾追35千米。虽可同时追踪10个目标，但只能半主动雷达制导导弹攻击其中的一个。在俯视/俯射时，随着距离与地面回波的增加，期信号衰减亦十分严重，对抗电子干扰的能力也不好。这些问题，代表了这架飞机无法以R-27导弹^]可靠的接战视距外目标，之后，CIA经由间谍获得N-019雷达的相关信息。为了改善这些缺陷，法左特龙推出了N-019MTopaz雷达并配备于米格-29S上。然而，其性能仍不能使苏军满意。在最后一种改良型中换装了有被动相控阵天线的N-010甲虫-M雷达，在探测距离、分辨率上获得大幅改善，并支持以R-77中程空对空导弹进行多目标接战。此外，米格-29在风挡前配备了与Su-27相同的S-31E2光电探测器，可从动于雷达，亦可独立运作并提供额外的枪炮瞄准协助。

米格-29的固定武装为一具GSh-30-130毫米机炮，射口在左侧翼根。原始设计携弹量为150枚，但在后续型号减为100枚。由于初期量产型的米格-29B在机腹中线携带副油箱时会挡住抛壳口，因此携带中线副油箱时无法使用机炮。这个问题直到米格-29S后的型号才被解决。在每个机翼下，依据不同的型号有三个或四个挂点，两边共有6~8个挂点。内侧挂点能携带1,150升（300加仑）的副油箱、或R-27中程空对空导弹，也可携带传统炸弹或火箭，部分型号的米格-29能在内侧挂点携带核弹。外侧挂点通常携带R-73导弹，但某些情况下也会配备较老旧的R-60短程空对空导弹。在两侧引擎中间的机腹中线挂点能携带一个1,500升（400加仑）的油箱，但是无法携带战斗用武器。原始的米格-29B能在机腹携带通用炸弹或火箭吊舱，但是无法配备精确制导武器。直到后继的改良型才提供携带激光或光电制导武器的能力，此时的米格-29才有能力使用空对地导弹。

米格-29能携带导弹、传统炸弹、火箭和副油箱共4500升。

技术参数

基本信息

飞行员：1

长度：17.32 m（56 ft 10 in）

翼展：11.36 m（37 ft 3 in）

高度：4.73 m（15 ft 6 in）

翼面积：38 m²（409 ft²）

空重：11,000 kg（24,250 lb）

载重：14,900 kg（33,730 lb）

最大起飞重量：18,000 kg（44,100 lb）

引擎：2×克里莫夫RD-33涡轮扇引擎，81.59 kN（18,342 lbf）

内部容量： 3,500 kg (7,716 lb) internal

性能

最高速度：**高空： 2.25马赫（2,400 km/h; 1,490 mph）

低空： 1.21马赫（1,500 km/h; 930 mph）

转场距离：2,100 km（1,300 mi; 1,130 nmi）with 1 drop tank

航程：1,430 km（888 mi; 772 nmi）with maximum internal fuel^[18]

实用升限：18,000 m（59,100 ft）

爬升率：330 m/s (65,000 ft/min) (初始); 109 m/s (21,500 ft/min) (平均)^[19]（0–6,000 m）

翼负荷：403 kg/m²（82 lb/ft²）

推重比：1.09

最大 g力负载： 9 g

武器

机炮： 1门GSh-30-1机炮（150发弹药）

武器挂载点： 7 × hardpoints (6 × underwing, 1 × fuselage) 个武器挂载点，设计可承受重量为up to 4,000千克（8,800磅） of stores ，而设计搭载的武器组合为：

火箭弹： ***S-5航空火箭弹

S-8航空火箭弹

S-24航空火箭弹

导弹： ***2 × R-60短程空对空导弹

4 × R-60中程空对空导弹

4 × R-73短程空对空导弹

炸弹： 6 × 665千克（1,466磅） bombs

航空电子系统

Phazotron Zhuk-ME

免责声明：本文由用户上传，如有侵权请联系删除！