tree命令的安装与使用

inux tree命令的安装与使用

在Linux系统中，tree命令是一个非常实用的工具，它能够以树状图的形式列出目录及其子目录的内容，使得目录结构一目了然。然而，并非所有的Linux发行版都默认安装了tree命令。本文将介绍如何在没有预装tree命令的Linux系统上安装它，并详细阐述其使用方法。

安装tree命令

对于基于Debian（如Ubuntu）的系统

在基于Debian的Linux发行版上，你可以使用apt包管理器来安装tree。打开终端并输入以下命令：

sudo apt update
sudo apt install tree

首先，apt update命令会更新软件包列表，确保你安装的是最新版本的软件包。然后，apt install tree命令会安装tree软件包。

对于基于RPM（如CentOS、Fedora）的系统

在基于RPM的Linux发行版上，你可以使用yum（在较旧的CentOS版本中）或dnf（在Fedora和较新版本的CentOS中）来安装tree。

对于使用yum的系统：

sudo yum install tree

对于使用dnf的系统：

sudo dnf install tree

对于Arch Linux

在Arch Linux上，你可以使用pacman包管理器来安装tree：

sudo pacman -S tree

通用方法（从源代码编译）

如果上述方法都不适用，或者你倾向于从源代码编译安装，你可以访问tree的官方网站或其他可靠的源代码仓库下载最新的源代码包。解压后，按照README文件中的说明进行编译和安装。不过，这种方法相对复杂，且通常不是必要的，除非你需要安装特定版本的tree或有其他特殊需求。

使用tree命令

安装完tree命令后，你就可以开始使用它了。以下是一些基本用法示例：

列出当前目录的树状结构

只需在终端中输入tree（不带任何参数），它将列出当前目录及其所有子目录和文件的树状结构。

tree

限制目录深度

如果你只想查看前几层的目录结构，可以使用-L选项来限制目录的深度。例如，要查看当前目录及其直接子目录的结构，可以使用：

tree -L 2

只显示目录

如果你只对目录结构感兴趣，而不关心文件，可以使用-d选项来只显示目录。

tree -d

带有文件大小的树状图

使用-h选项（人类可读的格式），tree将显示每个文件和目录的大小。

tree -h

忽略特定文件或目录

有时，你可能想忽略某些特定的文件或目录。tree命令提供了-I选项来实现这一点。例如，要忽略所有.txt文件和名为temp的目录，可以这样做：

tree -I '*.txt|temp'

注意：在某些shell中，你可能需要使用引号或转义字符来正确传递模式。

更多选项

tree命令还有许多其他选项，可以通过阅读其手册页（man tree）来了解更多信息。手册页详细描述了每个选项的作用和用法，是学习和掌握tree命令的好资源。

--help         Outputs a verbose usage listing.
--version      Outputs the version of tree.
-a             All files are printed. By default, tree does not print hidden files (those beginning with a dot `.'). In no event does tree print the file system constructs `.' (current directory) and `..' (previous directory).
-d             List directories only.
-f             Prints the full path prefix for each file.
-i             Tree does not print the indentation lines. Useful when used in conjunction with the -f option.
-l             Follows symbolic links to directories as if they were directories. Links that would result in a recursive loop are avoided.
-x             Stay on the current file system only, as with find -xdev.
 
-P pattern     List only those files that match the wildcard pattern. Note: you must use the -a option to also consider those files beginning with a dot `.' for matching. Valid wildcard operators are `*' (any zero or more characters), `?' (any single character), `[...]' (any single character listed between brackets (optional - (dash) for character range may be used: ex: [A-Z]), and `[^...]' (any single character not listed in brackets) and `|' separates alternate patterns.
 
-I pattern     Do not list those files that match the wildcard pattern.
 
--prune           Makes tree prune empty directories from the output, useful when used in conjunction with -P or -I.
--filelimit #     Do not descend directories that contain more than # entries.
--timefmt format  Prints (implies -D) and formats the date according to the format string which uses the strftime syntax.
--noreport        Omits printing of the file and directory report at the end of the tree listing.
 
-p             Print the protections for each file (as per ls -l).
-s             Print the size of each file with the name.
-u             Print the username, or UID # if no username is available, of the file.
-g             Print the group name, or GID # if no group name is available, of the file.
-D             Print the date of the last modification time for the file listed.
--inodes       Prints the inode number of the file or directory
--device       Prints the device number to which the file or directory belongs
-F             Append a `/' for directories, a `=' for socket files, a `*' for executable files and a `|' for FIFO's, as per ls -F
-q             Print non-printable characters in file names as question marks instead of the default carrot notation.
-N             Print non-printable characters as is instead of the default carrot notation.
-r             Sort the output in reverse alphabetic order.
-t             Sort the output by last modification time instead of alphabetically.
--dirsfirst    List directories before files.
-n             Turn colorization off always, overridden by the -C option.
-C             Turn colorization on always, using built-in color defaults if the LS_COLORS environment variable is not set. Useful to colorize output to a pipe.
-A             Turn on ANSI line graphics hack when printing the indentation lines.
-S             Turn on ASCII line graphics (useful when using linux console mode fonts). This option is now equivalent to `--charset=IBM437' and eventually is depreciated.
-L level       Max display depth of the directory tree.
-R             Recursively cross down the tree each level directories (see -L option), and at each of them execute tree again adding `-o 00Tree.html' as a new option.
-H baseHREF    Turn on HTML output, including HTTP references. Useful for ftp sites. baseHREF gives the base ftp location when using HTML output. That is, the local directory may be `/local/ftp/pub', but it must be referenced as `ftp://host-name.organization.domain/pub' (baseHREF should be `ftp://hostname.organization.domain'). Hint: don't use ANSI lines with this option, and don't give more than one directory in the directory list. If you want to use colors via CSS stylesheet, use the -C option in addition to this option to force color output.
 
-T title             Sets the title and H1 header string in HTML output mode.
--charset charset    Set the character set to use when outputting HTML and for line drawing.
--nolinks            Turns off hyperlinks in HTML output.
-o file name         Send output to file name.

总结

tree命令是Linux系统中一个非常有用的工具，它以直观的方式展示了目录结构。通过本文，你应该已经学会了如何在不同的Linux发行版上安装tree命令，并掌握了其基本的使用方法。现在，你可以利用tree命令来更高效地浏览和管理你的文件系统了。

关注本头条号，每天坚持更新原创干货技术文章。
如需学习视频，请在微信搜索公众号“智传网优”直接开始自助视频学习

1. 前言

本文主要讲解Linux系统上的tree命令的详细使用方法。

tree 命令是一个小型的跨平台命令行程序，用于递归地以树状格式列出或显示目录的内容。它输出每个子目录中的目录路径和文件，以及子目录和文件总数的摘要。

tree程序可以在Unix和类Unix系统(如Linux)中使用，也可以在DOS、Windows和许多其他操作系统中使用。它为输出操作提供了各种选项，从文件选项、排序选项到图形选项，并支持XML、JSON和HTML格式的输出。

在这篇教程中，我们将通过使用案例演示如何使用tree命令递归地列出Linux系统上目录的内容。

Linux tree 命令详细使用说明

2. 在各种发行版上安装tree命令

几乎所有的Linux发行版都可以使用tree命令，但是，如果默认情况下没有安装它，可以使用系统的包管理器来安装它，如下所示。

2.1 在RHEL/CentOS 7上安装tree命令工具

yum install tree

2.2 在Fedora 22+ /RHEL/CentOS 8上安装tree命令工具

dnf install tree

2.3 在Ubuntu/Debian系统上安装tree命令工具

sudo apt install tree

2.4 在openSUSE系统上安装tree命令工具

sudo zypper in tree

3. tree命令的日常使用案例

安装之后，您可以通过下面的使用案例进一步学习tree命令的用法。

要以类似树的格式列出目录内容，请切换到所需的目录并运行tree命令，不带任何选项或参数，如下所示。某些目录需要root权限，请使用sudo调用root权限，获取访问权。

tree

tree命令默认输出

sudo tree

它将递归地显示工作目录的内容，显示子目录和文件，以及子目录和文件总数的摘要。您可以使用-a标志显示隐藏文件。

sudo tree -a

要使用-f列出每个子目录和文件的完整路径内容，如下所示。

sudo tree -f

列出每个子目录和文件的完整路径内容

您还可以使用-d选项指定tree只打印子目录不显示里面的文件。如果与-f选项一起使用，tree将打印完整的目录路径，如下所示。

sudo tree -d

指定tree只打印子目录不显示里面的文件

sudo tree -df

可以使用-L选项指定目录树的最大显示深度。例如，如果您希望深度为2，则运行以下命令。

sudo tree -f -L 2

下面是将目录树的最大显示深度设置为3的使用案例：

sudo tree -f -L 3

若要仅显示与通配符内容匹配的文件，请使用-P选择并指定您的匹配内容。在本例中，该命令将只列出与cata*匹配的文件，例如Catalina.sh, catalina.bat等将被列出。

sudo tree -f -P cata*

使用tree命令以树状的形式显示目录内容

还可以通过添加--prune选项告诉tree从输出内容中删除空目录，如下所示。

sudo tree -f --prune

tree还支持一些有用的文件选项，如-p，它以类似ls -l命令的方式打印每个文件的文件类型和权限。

sudo tree -f -p

此外，要打印每个文件的用户名(如果没有用户名，则为UID)，使用-u选项，而-g选项打印组名(如果没有组名，则为GID)。您可以组合-p、-u和-g选项来执行类似于ls -l命令的输出结果，显示文件和目录的详细信息。

sudo tree -f -pug

还可以使用-s选项打印每个文件的字节大小以及文件名。为了以更易于阅读的格式打印每个文件的大小，使用-h选项并指定大小字母表示千字节(K)、兆字节(M)、千兆字节(G)、tb (T)等。

sudo tree -f -s

或者

sudo tree -f -h

要显示每个子目录或文件的最后修改时间的日期，请使用-D选项，如下所示。

sudo tree -f -pug -h -D

使用tree命令层次化显示目录内容

另一个有用的选项是--du，它将显示指定目录所占用的磁盘空间。

sudo tree -f --du

您还可以使用-o选项将tree的输出内容发送或重定向到文件名，以便稍后进行分析。

sudo tree -o direc_tree.txt

4. 总结

以上就是tree命令的全部内容，运行tree了解更多用法和选项。如果您有任何问题或想法要分享，请使用下面的反馈表格联系我们。

本文已同步至博客站，尊重原创，转载时请在正文中附带以下链接：https://www.linuxrumen.com/cyml/1783.html

xml是基于 libxml2解析库的Python封装。libxml2是使用C语言编写的，解析速度很好，不过安装起来稍微有点复杂。安装说明可以参考(http: //Lxml.de/installation.html)，在CentOS7上中文安装说明(http://www.cjavapy.com/article/64/)，使用lxml库来解析网络爬虫抓取到的HTML是一种非常高效的方式。lxml的html模块特别适合处理HTML内容，它可以快速解析大型HTML文件，并提供XPath和CSS选择器来查询和提取数据。

参考文档：https://www.cjavapy.com/article/65/

一、可能不合法的html标签解析

从网络上抓取到的html的内容，有可能都是标准写法，标签什么的都闭合，属性也是标准写法，但是有可能有的网站的程序员不专业，这样抓到的html解析就有可能有问题，因此，解析时先将有可能不合法的html解析为统一的格式。避免为后续的解析造成困扰。

1、lxml.html

lxml.html是专门用于解析和处理HTML文档的模块。它基于lxml.etree，但是为HTML文档的特点做了优化。lxml.html能够处理不良形式的HTML代码，这对于解析和爬取网页尤其有用。

>>> import lxml.html
>>> broken_html = '<ul class="body"><li>header<li>item</ul>'
>>> tree = lxml.html.fromstring(broken_html) #解析html
>>> fixed_html = lxml.html.tostring(tree,pretty_print=True)
>>> print fixed_html
<ul class="body">
<li>header</li>
<li>item</li>
</ul>

2、lxml.etree

lxml.etree是lxml库中用于处理XML文档的模块。它基于非常快的XML解析库libxml2，提供了一个类似于标准库xml.etree.ElementTreeAPI的接口，但是在性能和功能性方面要更加强大。lxml.etree支持XPath、XSLT、和Schema验证等高级XML特性。

>>> import lxml.etree
>>> broken_html = '<ul class="body"><li>header<li>item</ul>'
>>> tree = lxml.etree.fromstring(broken_html) #解析html
>>> fixed_html = lxml.etree.tostring(tree,pretty_print=True)
>>> print fixed_html
<ul class="body">
<li>header</li>
<li>item</li>
</ul>

通过以上可以看出，lxml可以正确解析两侧缺失的括号，并闭合标签，但不会额外增加<html>和<body>标签。

二、处理lxml解析出来的html内容

若在html中找到我们想要的内容，用lxml有几种不同的方法，XPath选择器类似Beautiful Soup的find()方法。CSS选择器用法和jQuery中的选择器类似。两种选择器都可以用来查找文档中的元素，但它们各有特点和适用场景。XPath是一种在XML文档中查找信息的语言。它可以用来遍历XML文档的元素和属性。CSS选择器通常用于选择和操作HTML文档中的元素。

1、XPath选择器(/单斜杠表示绝对查找，//双斜杠表示相对查找)

from lxml import etree
source_html = """
         <div>
            <ul>
                 <li class="item-0"><a href="link1.html">first item</a></li>
                 <li class="item-1"><a href="link2.html">second item</a></li>
                 <li class="item-inactive"><a href="link3.html">third item</a></li>
                 <li class="item-1"><a href="link4.html">fourth item</a></li>
                 <li class="item-0"><a href="link5.html">fifth item</a>
             </ul>
         </div>
        """
html = etree.HTML(source_html)
print(html)
result = etree.tostring(html)#会对的html标签进行补全
print(result.decode("utf-8"))

输出结果：

<Element html at 0x39e58f0>
<html><body><div>
<ul>
<li class="item-0"><a href="link1.html">first item</a></li>
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-inactive"><a href="link3.html">third item</a></li>
<li class="item-1"><a href="link4.html">fourth item</a></li>
<li class="item-0"><a href="link5.html">fifth item</a>
</li></ul>
</div>
</body></html>

1）获取某个标签的内容(a标签后不需要加斜杠，否则会报错)

#第一种写法

html = etree.HTML(source_html)
html_data = html.xpath('/html/body/div/ul/li/a')#绝对查找
#html_data = html.xpath('//li/a')#相对查找
print(html)
for i in html_data:
    print(i.text)

输出结果：

<Element html at 0x14fe6b8>
first item
second item
third item
fourth item
fifth item

#第二种写法
#在要找的标签后面加/text(),就是获取标签中的文本内容，结果中直接就是文本内容了，不用在通过text属性获取了。

html = etree.HTML(source_html)
html_data = html.xpath('/html/body/div/ul/li/a/text()')#绝对查找
#html_data = html.xpath('//li/a/text()')#相对查找
print(html)
for i in html_data:
    print(i)

输出结果：

<Element html at 0x128e3b7>
first item
second item
third item
fourth item
fifth item

2）获取a标签下的属性

html = etree.HTML(source_html)
html_data = html.xpath('//li/a/@href') #相对查找
#html_data = html.xpath('/html/body/div/ul/li/a/@href') #绝对查找
for i in html_data:
    print(i)

输出结果：

link1.html
link2.html
link3.html
link4.html
link5.html

3）查找a标签属性等于link2.html的内容

html = etree.HTML(source_html)
html_data = html.xpath('/html/body/div/ul/li/a[@href="link2.html"]/text()')绝对查找
#html_data = html.xpath('//li/a[@href="link2.html"]/text()')#相对查找
print(html_data)
for i in html_data:
    print(i)

输出结果：

['second item']
second item

4）查找最后一个li标签里的a标签的href属性

html = etree.HTML(source_html)
html_data = html.xpath('//li[last()]/a/text()')
print(html_data)
for i in html_data:
    print(i)

输出结果：

['fifth item']
fifth item

5）查找倒数第二个li标签里a标签的href属性

html = etree.HTML(source_html)
html_data = html.xpath('//li[last()-1]/a/text()')
print(html_data)
for i in html_data:
    print(i)

输出结果：

['fourth item']
fourth item

6）查找某个标签id属性值等于value的标签

//*[@id="value"]

7）使用chrome浏览器提取某个标签的XPath

2、CSS选择器(基本上和jQuery选择器用法一样)

选择器	描述
*	选择所有标签
a	选择<a>标签
.link	选择所有class = 'link'的元素
a.link	选择class = 'link'的<a>标签
a#home	选择id = 'home'的<a>标签
a > span	选择父元素为<a>标签的所有<span>子标签
a span	选择<a>标签内部的所有<span>标签

使用示例：

>>> html = """<div>
<tr id="places_area_row" class="body">
<td>header</td>
<td class="w2p_fw">item1</td>
<td class="w2p_fw">item2</td>
<td class="w2p_fw">item3</td>
<td><tr><td class="w2p_fw">header</td>
<td class="w2p_fw">item4</td>
<td class="w2p_fw">item5</td>
<td class="w2p_fw">item6</td></tr></td>
</tr>
</div>"""
>>> tree = lxml.html.fromstring(html)
>>> td = tree.cssselect('tr#places_area_row > td.w2p_fw')[0]
>>> htmlText = td.text_content()
>>> print htmlText
item1

参考文档：https://www.cjavapy.com/article/65/

在线咨询

上一篇：在网页开发中，我们需要掌握的常用HTML标签有哪些？
下一篇：「Python循环结构」利用for循环输出信息和求阶乘

您的项目需求

*请认真填写需求信息，我们会在24小时内与您取得联系。

整合营销服务商

tree命令的安装与使用

inux tree命令的安装与使用

安装tree命令

对于基于Debian（如Ubuntu）的系统

对于基于RPM（如CentOS、Fedora）的系统

对于Arch Linux

通用方法（从源代码编译）

使用tree命令

列出当前目录的树状结构

限制目录深度

只显示目录

带有文件大小的树状图

忽略特定文件或目录

更多选项

总结

1. 前言

2. 在各种发行版上安装tree命令

2.1 在RHEL/CentOS 7上安装tree命令工具

2.2 在Fedora 22+ /RHEL/CentOS 8上安装tree命令工具

2.3 在Ubuntu/Debian系统上安装tree命令工具

2.4 在openSUSE系统上安装tree命令工具

3. tree命令的日常使用案例

4. 总结

一、可能不合法的html标签解析

二、处理lxml解析出来的html内容

您的项目需求